Google Interview Question
Software Engineer / DevelopersTeam: SRE
Country: India
Interview Type: Phone Interview
building a suffix array would be more feasible and easier approach.
time complexity will be n*Log(n).
How does this solve for 'repeating' substring? Can anyone please explain. Thanks in advance.
I implemented it using KMP longest prefix finding approach.
Basically, I am finding the longest prefix of all substrings of the given string, which is proper suffix of the substring.
More clearly, for any substring s[i] ... s[j], I find the longest prefix s[i] ... s[k] which is proper suffix of s[i] ... s[j].
When, I have computed all such values, I go over them and check if the prefix found is a repeating string (it is, if the length of prefix is atleast half the length of the substring considered), and return the longest one.
Since I have to find for all strings, the running time is O(n^2). Is there any better time complexity? I am think maybe I can use the result of pi[0][x] ... pi[i-1][x] while computing pi[i][x], but having thought through it.
static string FindLongestRepeationgSubstring(string s)
{
int n = s.length();
int pi[n][n];
for (int i = 0; i < n; ++i)
pi[i][i] = i - 1;
for (int i = 0; i < n; ++i)
{
for (int j = i + 1; j < n; ++j)
{
int k = pi[i][j-1];
while (k >= i && s[j] != s[k+1]) k = pi[i][k];
if (s[j] == s[k+1]) k++;
pi[i][j] = k;
}
}
for (size_t i = 0; i < n; ++i)
{
for (size_t j = i; j < n; ++j)
{
cout << pi[i][j] << " ";
}
cout << endl;
}
int ls = -1;
int le = -1;
int l = 0;
for (int i = 0; i < n; ++i)
{
for (int j = i; j < n; ++j)
{
int tempL = pi[i][j] - i + 1;
if ((pi[i][j] >= j - tempL) && tempL > l)
{
l = tempL;
ls = i;
le = pi[i][j];
}
}
}
if (l > 0)
return s.substr(ls, l);
else
return "";
}
@gimli: i didn't understand the following:
"it is, if the length of prefix is atleast half the length of the substring considered".
We are looking for a repeated string .Why half length?
I don't understand why "aba" and "bab" are the longest repeating strings; I would think of "abab" as the longest one.
What's repeated here is the substring "ab". "abab" is a repetition of "ab". The meaning of "longest" repeating substring means the longest substring consisted of a unique pattern that is repeated later on. "ab" is the longest of a unique pattern that is repeated.
public static void main(String[] args) {
String input = "ababab";
List<String> result = getLongestString(input);
for(String str : result)
System.out.println(str);
}
private static List<String> getLongestString(String input) {
ArrayList<String> result = new ArrayList<String>();
for(int i=input.length()-1; i>0; i--) {
String tempStr = null;
for(int j=0; j<=input.length()-i; j++) {
String temp = input.substring(j, i+j);
tempStr = input.substring(0, i+j-1);
if(tempStr.indexOf(temp) > -1)
result.add(temp);
}
if(result.size() > 0)
return result;
}
return result;
}
public static void main(String[] args) {
String input = "ababab";
List<String> result = getLongestString(input);
for(String str : result)
System.out.println(str);
}
private static List<String> getLongestString(String input) {
ArrayList<String> result = new ArrayList<String>();
for(int i=input.length()-1; i>0; i--) {
String tempStr = null;
for(int j=0; j<=input.length()-i; j++) {
String temp = input.substring(j, i+j);
tempStr = input.substring(0, i+j-1);
if(tempStr.indexOf(temp) > -1)
result.add(temp);
}
if(result.size() > 0)
return result;
}
return result;
}
see Suffix_array in wikipedia
Construct a sorted suffix array with lcp in O(n) time. (It seems impossible to code this during interview).
Then for each suffix i, the maximum repeated substring is the lcp with i-1 or i+1. Since lcp is already computed, so for each suffix i, we can compute the repeated substring in O(1) time. Thus totally O(n) time.
char * longest_substring(char * string, int length)
{
if(string == NULL)
{
return NULL;
}
char * result = NULL;
int begin_str, end_str, count, max_begin, max_end, max_count=0;
int max_length = 0;
int end =0;
for(begin_str = 0; begin_str < length-1; begin_str ++)
{
for(end_str = begin_str; end_str < length-1; end_str++)
{
count = 0;
int i,k;
k=end_str +1;
end = 0;
do{
for(i = begin_str; i <=end_str && k < length && string[k] == string[i]; i ++, k++);
if(i > end_str)
{
count ++;
end = k-1;
}
}while(i > end_str);
if(end-begin_str > max_count)
{
max_length = (end_str - begin_str);
max_count = k-begin_str;
max_begin=begin_str;
max_end = end;
}
}
}
result = new char[max_end-max_begin+1];
int k =0;
for(int i = max_begin; i <=max_end; i ++,k++)
{
result[k] = string[i];
}
result[k] = '\0';
return result;
}
This use brutal force method, which go for all combinations. the running time would be O(n^3)
public String findRepeat(String input){
if(input == null || input.length() < 2){
System.out.println("String is null or string length is less then 2");
return null;
}
int longestStart = 0;
int longestEnd = 0;
for(int repeatLen = input.length()/2; repeatLen > 0; repeatLen--){
for(int i=0; i + repeatLen*2 <= input.length(); i++){
int start = i;
int end = i;
for(int curr = i; curr + repeatLen*2 <= input.length(); curr+=repeatLen){
if(compareStr(input.substring(curr, curr+repeatLen), input.substring(curr+repeatLen, curr+repeatLen*2))){
if(start == end){
start = curr;
end = curr+repeatLen*2;
if(end - start > longestEnd - longestStart){
longestStart = start;
longestEnd = end;
}
}else if(end >= curr){
end = curr+repeatLen*2;
if(end - start > longestEnd - longestStart){
longestStart = start;
longestEnd = end;
}
}else{
start = curr;
end = curr+repeatLen*2;
}
}
}
}
}
if(longestStart < longestEnd){
return input.substring(longestStart,longestEnd);
}
return null;
}
public boolean compareStr(String a, String b){
if( a == null || b == null){
System.out.println("One of string is null!");
return false;
}
if(a.length() != b.length()){
return false;
}
for(int i=0; i<a.length(); i++){
if(a.charAt(i) != b.charAt(i)){
return false;
}
}
return true;
}
* Longest repeated substring
** en.wikipedia.org/wiki/Longest_repeated_substring_problem, linear time solution using suffix tree
** KMP partial match function solution seems to give an O(n^2) solution only
** Suffix array can solve this problem in O(nlgn)
*** algs4.cs.princeton.edu/63suffix/
*** algs4.cs.princeton.edu/63suffix/LRS.java.html
1. Build the suffix tree T.
- dumbhead April 05, 20122. Depth of a node v is measured by the number of characters from root to v. Calculate the depth of each node as : depth(v) = depth(v.parent) + length(v)
3.Now we need to look for the deepest branching node in the tree T.