Amazon Interview Question
SDE1sCountry: India
Interview Type: In-Person
why are we starting from index 1 .....say you want to search a name "shashank" now assume uniform distribution of words as null hypothesis and go directly to supposed starting index of 's'.now keep correcting hypothesis. For example if we reach "m" instead of "s" then assume uniform distribution of words from m to z for remaining words and jump directly to supposed index of "s".
If the dictionary itself is sorted, this can be a simple binary search. Look up the 0.5 billion index, and see if the word should lie in the first half of the dictionary or the second. And then iterate this process, each time cutting the dictionary size by half. If the word exists, you'll get it in O(logn). If not, that also will be known in O(logn). Only thing is that you'd need to implement a proper comparator for strings.
string WordTobeSearch = "Repeat";
string str_left, str_right;
1. Apply binary search for word from index start to end using at index = 1, 2, 4, 8, 16, ....,i, 2i, ....end.
if(WordTobeSearch ==getWord(i))
{
return i; //index of WordTobeSearch
} else{
str_left = getWord(i); // str_left < WordTobeSearch
str_right = getWord(2i); //str_right > WordTobeSearch
}
apply procedure 1. from index i to 2i. until element is not find. or search space is exhausted.
smallchallenges,
if the dictionary is not sorted, how is this approach going to narrow down the search?
say getWord(i) < word < getWord(2i)? that doesn't provide anything because the dictionary is not sorted.
qxixp: the idea is to work on the index (i=1...billion) and not the file storing the words. Hope this helps.
Following code causes mlogn, where m is length of given string. This can be optimized little without any change in the complexity.
public static int binarySearch(int i, int start, int end)
{
if(i == givenWord.length())
return -1;
int mid = (start+end)/2;
for(int j = 0; j < i; j++)
{
if(dictionary[mid].charAt(j) > givenWord.charAt(j))
return binarySearch(j, start, mid);
else if(dictionary[mid].charAt(j) < givenWord.charAt(j))
return binarySearch(j, mid, end);
}
if(dictionary[mid].charAt(i) > givenWord.charAt(i))
{
return binarySearch(i,start, mid);
}
else if(dictionary[mid].charAt(i) < givenWord.charAt(i))
{
return binarySearch(i,mid, end);
}
else if(dictionary[mid].charAt(i) == givenWord.charAt(i))
{
if(i == givenWord.length()-1)
return mid;
return binarySearch(++i, start, end);
}
return 0;
}
1. Binay search will give O(log n) soln:
maxidx = maximum index of the dictionary
W: Word whose idx is to be found
string W1 = getWord(maxIdx/2);
int temp = strncmp(W1, W);
if (temp > 0){
//W1 > W. search in upper half
findIdx(maxidx/2, W);
}
else if (temp < 0){
//find in lower half
findIdx((1.5*maxidx), W);
}
else{
//word matched, return this idx
return maxidx/2;
}
If we know the size of the dictionary then a straight forward BINARY SEARCH is perfect enough.
- ninhnnsoc February 28, 2014If we don't know the size, instead we're only given the query method, then we need to find the index range [st, ed] first, where getWord[st] < theGivenWord < getWord[ed], by REPEATED DOUBLING.
So, try to query at index 1, 2, 4, 8, 16, ..., 2^k,... and find the [st, ed].
After knowing [st,ed], do binary search...
The overall time is O(logn) still.