Google Interview Question
Software Engineer / DevelopersCountry: India
1. find the biggest character in the string O(n)
eg: adecb -->e
2. build a boolean array, which has the length of the string and traverse the string, if the
character is biggest character then its corresponding boolean is true,otherwise false
eg: abcee
F,F,F,T,T
3. find the longest substring which only contains the biggest character O(n)
eg: abceeedcabeeecdd
eee eee
Notes: this step may return multiple substrings
4.compare the next character for the substrings we have in step 3
eg: abceeedcabeeecddeeemfefeeemk
eeed eeec eeem eeem
we compare 'd','c','m', 'm' and get the biggest, and continue to compare the next character
eg: compare 'f' with 'k'
we keep comparing until one of the following case is true
a. there is only one biggest substring
eg 1:
eeemeeekeeez
m k z
-->eeez is the answer
eg 2:eeemeee
m (empty or null)
-->eeem is not the answer, the answer is eeemeee, the returned substring must be a suffix.
c. one of the substrings we compared hits the biggest character
eeemeeemn
m m
since 'm'=='m' we compare the next,
the next character for eeem is 'e', which is the biggest, so we return eeemeeemn.
Nice solution. But there is something I do not understand:
how do you compare the characters in step 4. I am asking since I can think of only O(n/2)log(n/2) complexity.
e.g. abanavacam
p.s
In non general case: If you have only 26 symbols you can simply use use an array cnt with size = 26 and keep the indexs of the first elements. This will be constant time and your approach will be O(n). But in general case it seems to me like O(nlogn)
In "c. one of the substrings we compared hits the biggest character", your result will be inconclusive if there are multiple places where the biggest character is hit. E.g. with "zzzbzzzbzzzbzzzazzzbzzzbzzz", we have multiple strings where zzz is followed by bz, so we have to keep looking at all of these, and keep repeating until we're down to one starting location under consideration, whether the new character in the strings we are considering is a new character or a repeat of the old character.
This means the "we keep comparing until..." step can be repeated O(n) times, making this an O(n^2) algorithm.
I am really confused here. How is eee the maximum element when the string contains m,k,z?
How about divide and conquer?
Assume the string has length n. Let the string be S. Look at S1 = S(0, \lfloor n/2 \rfloor) and S2 = S(\lfloor n/2 \rfloor + 1, n-1) ..Suffix of S2 = Suffix of S. Suffix of S1 + S2 = Suffix of S.
Find the largest suffix of S1 and S2. Note that the largest suffix will be unique since each suffix has different length. Add S2 to the largest suffix of S1 and compare this newly formed suffix of S with largest suffix of S2. This comparison can be done is O(n) time.
So, our time equation is
T_n = 2 T_n/2 + O(n)... This gives an O(n log n ) solution, Oh well..
I dont think this works. Lets take "babbaa" as an example.
Correct answer should be : bbaa
Follow your algorithm:
1. break babbaa into bab baa
for bab, if we break it, the resulting two substrings are ba and b; thus we need to choose between the two largest suffix bab(ba+b) and b. bab is lexicographicaly larger, so we return bab;
similarly, the maximum suffix of baa is baa. now we compare babbaa and baa, the result is babbaa.
However, the correct answer should be bbaa.
Can someone tell me why this isn't correct?
public String run(String word) {
String currentPrefix = "";
String answer = "";
for (int j = word.size() - 1; j >=0; j++) {
currentPrefix = word.get(j) + currentPrefix;
if (answer.isEmpty() || currentPrefix.compareTo(answer) > 0) {
answer = currentPrefix;
}
}
return answer;
}
Search for Maximum Suffix problem on Yahoo and you will get some research papers for doing it in linear time.
Scan from left to right and find the starting index of lexicographically largest string at each index. Interesting thing is that you could tell the lexicographically largest string at index i, based on the lexicographically greatest string at index i-1 with constant number of comparisons.
If A[i] is the current character and maxlex is the starting index of lexicographically largest string till A[i-1]. It gets a bit trickier when A[i]=A[maxlen].
Could be done in O(n) time and constant space.
This would work in O(n) time.
Key ideas are
- the answer should start with the largest character
- so search from right to left, remember the index where we encounter
the largest character so far as the running best(=starting point of the max substr)
- then it's all about tie-breaking. when tied, instead of updating the running
best, simply count the offset until tie is broken.
- while tie-breaking, if a even larger character is encountered, update the
running best right away. if the same character is encountered, we do
nothing as the current candidate is always larger.
- last, if tie is not resolved till the end, longer one beats so we can keep the
current best.
#include <stdio.h>
#include <stdlib.h>
void main( int argc, char** argv )
{
char* s = argv[1];
int answer = -1;
int offset = -1;
int i;
for( i = 0; i < strlen( s ); i++ )
{
if( answer == -1 ||
s[answer] < s[i] )
{
answer = i;
offset = -1;
}
else if( offset >= 0 )
{
offset++;
// we're tie-breaking
if( s[answer+offset] < s[i] )
{
answer = i-offset;
offset = -1;
}
else if( s[answer+offset] > s[i] )
{
offset = -1;
}
}
else if( s[answer] == s[i] )
{
offset = 0;
}
}
printf( "%d\t%s\n", answer, s+answer );
}
coming from a paper: "A note on a simple computation of the maximal suffix
of a string.pdf" by Zbigniew Adamczyk , Wojciech Ryttera
int ComputeMaxSufPos(string &str)
{
int i = 0, j = 1;
size_t n = str.size();
string sub1, sub2;
while (j < n)
{
int k = 0;
//sub1 = = str.substr;;
//sub2 = str.substr;
while (j + k < n - 1 && str.substr(i + k) == str.substr(j + k))
{
k++;
}
if (str.substr(i + k) < str.substr(j + k))
{
i += (k / abs(j - i) + 1)*(j - i);
j = i + 1;
}
else
{
j = j + k + 1;
}
}
return i;
}
There is a simple/beautiful an algorithm on this problem. Imagine in an interactive game, two opponents A and B work against each other, to find the max suffix of a given string s. Initially A picks suffix s[i..], and B picks suffix s[j..]
i: _____X
j: _____Y
A judge compares two suffix(by matching them) and find there is a mismatch at certain position, as shown in the fig above.
Without loss of generality, we assume X > Y, then B is lost in this round. So he has to improve his solution in order to beat A in next round. If B is smart, he will not pick any suffix at position j, j + 1, ..., until the mismatched position, because s[j..] is beaten by s[i..] and s[j+1..] is beaten by s[i+1..]. So next suffix B will come up is S[k..], k is the position after mismatch..
After couple rounds, either A or B will run out choices. When this happens, the current suffix that the winner holds is the max suffix.
Here is the algorithm in pseudo code:
int i = 0, j = 1, k;
while (i < s.size() && j < s.size()) {
for (k = 0; i + k < s.size() && j + k < s.size() && s[i + k] == s[j +k]; ++k) { //judge
}
if (i + k == s.size()) return j; //A is finally lost
if (j + k == s.size()) return i; //B is finally lost
if (s[i + k] > s[j + k]) //B is lost in this round
j = j + k + 1;
else //A is lost in this round
i = i + k + 1;
}
The algorithm is linear because after each round,k possible positions are eliminated from either A, or B.
There is a simple/beautiful an algorithm on this problem. Imagine in an interactive game, two opponents A and B work against each other, to find the max suffix of a given string s. Initially A picks suffix s[i..], and B picks suffix s[j..]
i: _____X
j: _____Y
A judge compares two suffix(by matching them) and find there is a mismatch at certain position, as shown in the fig above.
Without loss of generality, we assume X > Y, then B is lost in this round. So he has to improve his solution in order to beat A in next round. If B is smart, he will not pick any suffix at position j, j + 1, ..., until the mismatched position, because s[j..] is beaten by s[i..] and s[j+1..] is beaten by s[i+1..]. So next suffix B will come up is S[k..], k is the position after mismatch..
After couple rounds, either A or B will run out choices. When this happens, the current suffix that the winner holds is the max suffix.
Here is the algorithm in pseudo code:
int i = 0, j = 1, k;
while (i < s.size() && j < s.size()) {
for (k = 0; i + k < s.size() && j + k < s.size() && s[i + k] == s[j +k]; ++k) { //judge
}
if (i + k == s.size()) return j; //A is finally lost
if (j + k == s.size()) return i; //B is finally lost
if (s[i + k] > s[j + k]) //B is lost in this round
j = j + k + 1;
else //A is lost in this round
i = i + k + 1;
}
The algorithm is linear because after each round,k possible positions are eliminated from either A, or B.
The question required a O(n) algorithm so we can scan the input only constant times and it is reasonable to think to start the scan from the end. Here is sample code
public static String maxSuffix(String s) {
if (s == null || s.length() < 2) {
return s == null ? null : new String(s);
}
char[] a = s.toCharArray();
int[] M = new int[a.length];
M[a.length - 1] = a.length - 1;
for (int i = a.length - 2; i >= 0; i--) {
if (a[i] >= a[M[i + 1]]) {
M[i] = i;
} else {
M[i] = M[i + 1];
}
}
return s.substring(M[0], s.length());
}
Where M[i] means the beginning index of the max suffix for sub-string starting at position i, 0 <= i < s.length.
if we can use extra space create suffix tree and the right most suffix in that is the biggest suffix.
- jeff September 28, 2012