Google Interview Question
SDE1sCountry: United States
Interview Type: In-Person
Something like this should be ok..
The program should be pretty much self explanatory
Hope this helps
static String string;
public static void main(String[] args) {
System.out.print("Enter the string : ");
string = (new Scanner(System.in)).nextLine();
for (int i = 0; i < string.length(); i++)
System.out.println("S(" + (i + 1) + "," +
string.length() + ") = " + S(i, string.length()));
for (int i = 0; i < string.length(); i++)
System.out.println("len(LCP(s(" + (i + 1) + "," +
string.length() + "),s)) = " +
LCP(S(i, string.length()), string).length());
}
static String S(int start, int end) {
return string.substring(start, end);
}
static String LCP(String a, String b) {
for (int i = a.length(); i > 0; i--) {
String tmp = a.substring(0, i);
if (b.startsWith(tmp)) return tmp;
}
return "";
}
This solution is not valid because it does not take linear time. The question specifically specified "Find the LCP lengths of *all substrings* in O(n)".
This one clearly is at least O(n^2) time. There's a loop calling LCP n times, and LCP itself both has a loop and a call to startsWith (which is O(n) itself).
C
#include <stdio.h>
#include <memory.h>
#define MAX_STRING 256
int main() {
char s[] ="ababac";
int lcps[MAX_STRING];
for (int i=0; i<strlen(s); i++) {
lcps[i] = 0;
if (s[i] == s[0]) {
lcps[i]++;
}
for (int j=0; j<i; j++) {
if (lcps[j] == i && s[i] == s[lcps[j]]) {
lcps[j]++;
} else if (lcps[j] > 0) {
if(s[i] == s[lcps[j]] && (j+lcps[j]) >= i) {
lcps[j]++;
}
}
}
}
for(int i = 0; i < strlen(s); i++) {
printf("len(LCP(s(%d,%d),s)) = %d\n", i, strlen(s), lcps[i]);
}
return 0;
}
C
#include <stdio.h>
#include <memory.h>
#define MAX_STRING 256
int main() {
char s[] ="ababac";
int lcps[MAX_STRING];
for (int i=0; i<strlen(s); i++) {
lcps[i] = 0;
if (s[i] == s[0]) {
lcps[i]++;
}
for (int j=0; j<i; j++) {
if (lcps[j] == i && s[i] == s[lcps[j]]) {
lcps[j]++;
} else if (lcps[j] > 0) {
if(s[i] == s[lcps[j]] && (j+lcps[j]) >= i) {
lcps[j]++;
}
}
}
}
for(int i = 0; i < strlen(s); i++) {
printf("len(LCP(s(%d,%d),s)) = %d\n", i, strlen(s), lcps[i]);
}
return 0;
}
The algorithm is pretty much to take the substrings of the main string (prefixes) and use two pointers to scan the two strings and move them at the same time until the characters do not match. I present a concise solution in Python below.
def lcpOfAllSubstrings(s):
for i in range(len(s)):
count = 0
for l1, l2 in zip(s, s[i:]):
if l1 != l2:
break
count += 1
print('{}: len(LCP(s({}, {}), s)) = {}'.format(i + 1, i + 1, len(s), count))
# Test code
lcpOfAllSubstrings("ababac")
Output
1: len(LCP(s(1, 6), s)) = 6
2: len(LCP(s(2, 6), s)) = 0
3: len(LCP(s(3, 6), s)) = 3
4: len(LCP(s(4, 6), s)) = 0
5: len(LCP(s(5, 6), s)) = 1
6: len(LCP(s(6, 6), s)) = 0
public String helper(String s1, String s2) {
StringBuilder sb = new StringBuilder();
int mLen = min(s1.length(), s2.length());
for (int i = 0; < mLen; i++) {
if(s1.charAt(i) != s2.charAt(i)) {
break;
}
sb.append(i);
}
return sb.toString();
}
public String LPS(String arr[], int low, int high) {
if(low == high) return arr[low];
if(high > low) {
int mid = low + (high - low) / 2;
String s1 = LPS(arr, low, mid);
String s2 = LPS(arr, mid + 1, high);
return (helper(s1, s2));
}
return null;
}
public String helper(String s1, String s2) {
StringBuilder sb = new StringBuilder();
int mLen = min(s1.length(), s2.length());
for (int i = 0; < mLen; i++) {
if(s1.charAt(i) != s2.charAt(i)) {
break;
}
sb.append(i);
}
return sb.toString();
}
public String LPS(String arr[], int low, int high) {
if(low == high) return arr[low];
if(high > low) {
int mid = low + (high - low) / 2;
String s1 = LPS(arr, low, mid);
String s2 = LPS(arr, mid + 1, high);
return (helper(s1, s2));
}
return null;
}
A solution in Go. I'm not convinced it's actually O(n) time though. I think it ends up being O(kn) where k is the average length of a substring (excluding s being a substring of itself).
That's because on each loop through, on average it loops through 'k' candidates to check for progress in my solution.
I don't know how you'd do this in O(n); none of the current solutions appear to be that efficient.
package main
import (
"fmt"
"sort"
"strings"
)
func main() {
s := "ababac"
result := LongestPrefixesFrom(s)
fmt.Println(FormatResult(s, result))
}
// Takes a string. Returns a map from start index to length for substrings of s
// that are also prefixes of s.
func LongestPrefixesFrom(s string) map[int]int {
// HashMap of strings from their start point to length that are not ongoing
// in our current scan.
// Pre-insert 's' since it's always a substring of itself of length s
res := map[int]int{
0: len(s),
}
// HashSet of candidate. As we move through indexes, each candidate in this
// set is valid for index i-1, starting at the index recorded in the hashset.
// That is to say, with the example string 'ababc', the hashset might contain
// '2' to indicate a valid substring starts at index '2'.
// On the next index, 3, it would be checked this substring is still valid by
// checking if the value at index 3 (1 away from the start of 2) matches the
// value 1 from the start.
// In this example it does match. The check will repeat at index 4 (2 away
// from the start at index 2). Since the value at index 2 doesn't match, that
// means the last valid character in the substring was the previous one, so
// it can now be removed from the currently tracked substrings and moved into
// the result set.
candidates := map[int]struct{}{}
for i := 0; i < len(s); i++ {
for candidate := range candidates {
// candidate is something like 'index 5' indicating that a prefix of s starts at 5.
// To check if this candidate is still valid from 5 to i, we just check
if s[i-candidate] != s[i] {
delete(candidates, candidate)
res[candidate] = i - candidate
}
}
if s[0] == s[i] {
candidates[i] = struct{}{}
} else {
res[i] = 0
}
}
// add any existing candidates; those run until the end of the string
for candidate := range candidates {
res[candidate] = len(s) - candidate
}
return res
}
func FormatResult(s string, res map[int]int) string {
parts := []string{}
for key, val := range res {
// +1 for 1 index instead of 0 index to match the question
str := fmt.Sprintf("len(LCP(s(%v,%v)) = %v", key+1, len(s), val)
parts = append(parts, str)
}
// sort because map iteration is random in go
sort.Strings(parts)
return strings.Join(parts, "\n")
}
Hi DaveX, could we maybe try to use a Trie for this situation?. Like insert the word "ababac" and just take substrings of it by walking down the trie? I could be wrong, but wouldn't it be O(n) time then? What are your thoughts?
Let's say you have the worst case string, which is "aaaaa...".
Your trie is now just "a -> a -> a -> a ..." (which is identical your list of "a"s really).
Let's look at the runtime of lookups in that trie: To lookup the first character, you walk n characters in the trie. For the second, you walk n-1 characters, then n-2, n-3, etc.
That ends up being the sum of 0..n, which is O(n^2).
I don't think a trie is actually any more efficient than searching the list itself because there's only one prefix to search, not a bunch, and tries only really help if there are a bunch of prefixes.
Python3
inp = 'ababac'
sdict = {}
for i in range(len(inp)):
sdict[i] = inp[i:]
count = 0
LCP = {}
for sub_s in sdict.values():
for i in range(1,len(sub_s)+1):
if sub_s[:i] == inp:
LCP[sub_s] = len(inp)
count = 0
break
elif inp.startswith(sub_s[:i]):
count += 1
else:
LCP[sub_s] = count
count = 0
break
for key,value in LCP.items():
print('{}: {}'.format(key,value))
import java.util.*;
class subseq
{
public static void main(String[] args)
{
String Sen="ababac";
sseq(Sen);
}
public static void sseq(String S)
{
int length = S.length();
String[] seq = new String[length];
int values[]=new int[length];
int l1=0;
String s="";
int count=0;
int c=0;
for(int i=0;i<length;i++)
{
seq[i]=S.substring(i,length);
}
for(int i=0;i<length;i++)
{
s=seq[i];
l1=s.length();
count=0;
for(int j=0;j<l1;j++)
{
if(s.charAt(j)==S.charAt(j))
{
count=count+1;
}
else
{
break;
}
}
values[c++]=count;
}
for(int i=0;i<values.length;i++)
{
System.out.println(seq[i]+"::"+values[i]);
}
}
}
I'm a veteran, but it made me think a bit of this problem that at first glance can not be done in O (n), yet ...
So, the good news is that it can be done in O(n)... the bad news is that I would not have passed the interview :)), because it took me a while. Today I had a little time and find this idea using KMP. The idea came from the following statement: "the longest prefix which is a good suffix"... exactly what we need.
So, here are the steps:
1. Do the lps table (this is done in O(n) - see KMP algorithm), plus set the oldest parent for each suffix.
2. Rotate all ascending sub-arrays in the lps table ( O(n) )
3. If the parent for a chr is not 0 then set the lps[chr] = 0 (also done in O(n) ) and for the first char set len(s) - the longest
So, as you can see you need 3 traversal for the initial string no matter how long it is (how big is n)... thus O(3n) = O(n)
Here is an example:
step 1:
s = a b a b a c
lps = 0 0 1 2 3 0
parent = 0 1 0 1 0 6
- this is done in O(n)
step 2:
s = a b a b a c
lps = 0 0 3 2 1 0
parent = 0 1 0 1 0 6
step 3:
s = a b a b a c
lps = 6 0 3 0 1 0
parent = 0 1 0 1 0 6
Done.
So, here is the implementation in python:
"""
Problem:
--------
Given a string s contains lowercase alphabet, find the length of the Longest common Prefix of all substrings in O(n)
For example
s = 'ababac'
Then substrings are as follow:
1: s(1, 6) = ababac
2: s(2, 6) = babac
3: s(3, 6) = abac
4: s(4, 6) = bac
5: s(5, 6) = ac
6: s(6, 6) = c
Now, The lengths of LCP of all substrings are as follow
1: len(LCP(s(1, 6), s)) = 6
2: len(LCP(s(2, 6), s)) = 0
3: len(LCP(s(3, 6), s)) = 3
4: len(LCP(s(4, 6), s)) = 0
5: len(LCP(s(5, 6), s)) = 1
6: len(LCP(s(6, 6), s)) = 0
"""
class LCP:
@staticmethod
def build_lps_asc_parent_tables(p:str):
lps = [0] * len(p)
asc = [-1] * len(p)
parent = [i for i in range(0, len(p))]
k = -1
lps[0] = -1
parent[0] = 0
for i in range(1, len(p)):
while (k >= 0 and p[k+1] != p[i]):
k = lps[k]
if p[k+1] == p[i]:
k += 1
# longest prefix which is also a suffix for p[0:i]
lps[i] = k;
parent[i] = parent[k]
if (asc[i-1]<0 and k>=0):
asc[i] = i
elif (asc[i-1]>=0 and k>=0):
asc[i] = asc[i-1]
return (lps,asc,parent)
@staticmethod
def rotate_asc_subarr(a: list, asc:list):
i = len(asc) - 1
while i>=0:
if asc[i] >= 0:
# rotate
end = i
start = asc[i]
while (start < end):
a[start],a[end] = a[end],a[start]
end -=1
start +=1
# jump to the next one
i = asc[i] - 1
else:
i -= 1
@staticmethod
def filter_lsp(lsp, parent):
for i in range(0, len(lsp)):
if parent[i] != 0:
lsp[i] = 0
else:
lsp[i] += 1
lsp[0] = len(lsp)
return lsp
class test:
def do_test_1(self):
s = "ababac"
(lsp,asc,parent) = LCP.build_lps_asc_parent_tables(s)
LCP.rotate_asc_subarr(lsp, asc)
lcp = LCP.filter_lsp(lsp, parent)
print(lcp)
t = test()
t.do_test_1()
I see that CLRS cut the string search algorithms from the last edition: KMP, Boyer-Moore, and Rabin-Karp. I also wrote a book (Blueprints of Common Algorithms: A simplified approach to the analysis and implementation of common algorithms) and I didn't include them in it... but it seems that they are needed, so I'm thinking to add them in the next editions.
Create the suffix_arr -> O(n)
Sort it -> O(nlogn)
Build LCP array
max = Integer.MAX_VALUE;
LCP[0] =0;, j =1 to n ; and i =0 to n-1
for all suffix_arr element O(n-1)
match suffix_arr[i] with suffix_arr[i+1] and count the number of matching characters = count -> O(K) where k is max. number of matching characters between 2 suffixes
max = Math.max(max,count)
return max
I think the maximum time that we are reaching here is O(n).
Comments?
#In Python
#Is this correct?
def cadena(string):
lista=[]
for i in range(len(string)):
lista.append(string[i:len(string)])
j=0
i=0
resultados=[]
while j<len(lista):
if i<len(lista[j]):
if string[i]==lista[j][i]:
i+=1
else:
j=j+1
resultados.append(i)
i=0
else:
resultados.append(i)
j=j+1
i=0
for i in range(len(string)):
print("len(LCP(s("+str(i+1)+","+str(len(string))+")s)="+str(resultados[i]))
s="ababac"
cadena(s)
Is this correct?
{{
def cadena(string):
lista=[]
for i in range(len(string)):
lista.append(string[i:len(string)])
j=0
i=0
resultados=[]
while j<len(lista):
if i<len(lista[j]):
if string[i]==lista[j][i]:
i+=1
else:
j=j+1
resultados.append(i)
i=0
else:
resultados.append(i)
j=j+1
i=0
for i in range(len(string)):
print("len(LCP(s("+str(i+1)+","+str(len(string))+")s)="+str(resultados[i]))
s="ababac"
cadena(s)
}}
func lcpsubstring(s string) []int {
lcp := make([]int, len(s))
lcp[0] = 0
for i, j := 0, 1; j < len(s); j++ {
if s[i] == s[j] {
i++
lcp[j] = i
} else {
i = 0
}
}
max := 0
c := 0
for i := len(lcp) - 1; i >= 1; i-- {
if s[i] == s[0] {
c++
if lcp[i] == 1 {
lcp[i] = max
if c > max {
lcp[i] = c
}
max = 0
} else {
if lcp[i] > max {
max = lcp[i]
}
lcp[i] = c
}
} else {
c = 0
}
}
lcp[0] = len(s)
return lcp
}
func lcpsubstring(s string) []int {
lcp := make([]int, len(s))
lcp[0] = 0
for i, j := 0, 1; j < len(s); j++ {
if s[i] == s[j] {
i++
lcp[j] = i
} else {
i = 0
}
}
max := 0
c := 0
for i := len(lcp) - 1; i >= 1; i-- {
if s[i] == s[0] {
c++
if lcp[i] == 1 {
lcp[i] = max
if c > max {
lcp[i] = c
}
max = 0
} else {
if lcp[i] > max {
max = lcp[i]
}
lcp[i] = c
}
} else {
c = 0
}
}
lcp[0] = len(s)
return lcp
}
Can anyone help me to understand the question. Any other example other than the described one?
The problem as it formulated does not make any sense. Longest common prefix of "all substrings" will be 0 as empty substring is also a substring. If considering onlynon-empty substring it will be not more than 1, because of substrings of 1 character.
From the example it seems that the right problem is "Find substring s of string S with the longest common prefix with S". And even this does not make sense as you see in auther's example - off course S is substring of S and it has the longest prefix.
So, the right problem that author was asked most likely was "Find substring s of string S, not equal to S with the longest common prefix with S".
It is hard to blame author if it was Google phone interview, they often formulate problems verbally and it maybe very difficult to understand the exact problem.
You can do this in O(n^2) time using Trie.
Here is the algo;
We need to build trie along with smartLCP map which consist below things
1. start index, lcpLenght -> this will help us to return the query in constant time
How to build them in O(n^2);
Well here it is ;
1. start from the original string and put it into the trie, and insert in smartLCP (0, len(input))
2. now, start taking substring from index 1 onwards;
before inserting it into trie, check that the first character of this substring is same as first char
of input string or not; if not discard
if yes, start inserting into trie (may go to existing branch or create a new branch), as you
go down, check if it starting a new branch or continued it, get the length of matching(prefix)
and put in smartLCP
Trie would look like this for given above example; SmartLCP{(0,6), (2,3), (4,1)}
a
c b
a
b c
a
c
now its constant time to say the length of each lcp using smartLcp array.
Complexity: Building trie for a string is O(n) since there are n^2 substrings (which we can't avoid)
so complexity of creating trie is O(n^2) in worst case ( worst case occurs when all the character is same)
Simple Z.
public static int[] computeZ(String s) {
int l = 0; r = 0;
int [] Z = new int[len];
int len = s.length();
for (int k =0 ; k < len; k++ ) {
int j;
if (k < r) {
j = (z[k-l] < (r-k)) ? z[k-l] : (r-k)
} else {
j = 0;
}
while (k + j < len) {
if (s.charAt(k+j) == s.charAt(j)) {
j++;
} else {
break;
}
}
if (k + j > r) {
l = k;
r = k + j;
}
}
Z[0] = len;
return Z;
}
This problem is also known as z-algorithm, which runs in linear time. Here's a C++ implementation.
- coderick14 February 20, 2019