There is a dictionary of billi

Amazon Interview Question for SDE1s

0

of 0 votes

16
Answers
There is a dictionary of billion words and there is one method provided
String getWord(int index); We can give it index and it will return the String on that index .
Now word is given to us we have to find out its index. O(logn) solution was required.
- pavan February 27, 2014 in India | Report Duplicate | Flag | PURGE
Amazon SDE1 String Manipulation

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: In-Person

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 5 vote

If we know the size of the dictionary then a straight forward BINARY SEARCH is perfect enough.

If we don't know the size, instead we're only given the query method, then we need to find the index range [st, ed] first, where getWord[st] < theGivenWord < getWord[ed], by REPEATED DOUBLING.
So, try to query at index 1, 2, 4, 8, 16, ..., 2^k,... and find the [st, ed].

After knowing [st,ed], do binary search...

The overall time is O(logn) still.

- ninhnnsoc February 28, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Nice solution. The repeated doubling is called gallop search. :)

- Killedsteel March 02, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

why are we starting from index 1 .....say you want to search a name "shashank" now assume uniform distribution of words as null hypothesis and go directly to supposed starting index of 's'.now keep correcting hypothesis. For example if we reach "m" instead of "s" then assume uniform distribution of words from m to z for remaining words and jump directly to supposed index of "s".

- shashank May 25, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

If the dictionary itself is sorted, this can be a simple binary search. Look up the 0.5 billion index, and see if the word should lie in the first half of the dictionary or the second. And then iterate this process, each time cutting the dictionary size by half. If the word exists, you'll get it in O(logn). If not, that also will be known in O(logn). Only thing is that you'd need to implement a proper comparator for strings.

- AK February 27, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

string WordTobeSearch = "Repeat";
string str_left, str_right;
1. Apply binary search for word from index start to end using at index = 1, 2, 4, 8, 16, ....,i, 2i, ....end.
if(WordTobeSearch ==getWord(i))
{
return i; //index of WordTobeSearch
} else{
str_left = getWord(i); // str_left < WordTobeSearch
str_right = getWord(2i); //str_right > WordTobeSearch
}
apply procedure 1. from index i to 2i. until element is not find. or search space is exhausted.

- _akt February 27, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

-3

of 3 votes

Good one. This was what I was thinking as well. Doing it this way, we do not care about how the 1 billion words are stored...sorted, not sorted etc. In fact, since an Index is used for the 1billion names, it may be fair to assume that the words may not be sorted. The assumption being that the index is sorted.So, this method is good.

- smallchallenges February 27, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Can you explain it with an example please?

- Aman February 27, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

smallchallenges,

if the dictionary is not sorted, how is this approach going to narrow down the search?
say getWord(i) < word < getWord(2i)? that doesn't provide anything because the dictionary is not sorted.

- qxixp February 27, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

qxixp: the idea is to work on the index (i=1...billion) and not the file storing the words. Hope this helps.

- smallchallenges February 28, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

still not get it....the words should be sorted.

- Anonymous February 28, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The way i understand it this algo has O(N) complexity. If the word to be found has index second to last then this algo will traverse all the elements. If not then perhaps i didn't understand the algo and please elaborate more.

- kr.neerav March 01, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Following code causes mlogn, where m is length of given string. This can be optimized little without any change in the complexity.

public static int binarySearch(int i, int start, int end)
    {
        if(i == givenWord.length())
            return -1;
        
        int mid = (start+end)/2;
        
        for(int j = 0; j < i; j++)
        {
            if(dictionary[mid].charAt(j) > givenWord.charAt(j))
                return binarySearch(j, start, mid);
            else if(dictionary[mid].charAt(j) < givenWord.charAt(j))
                return binarySearch(j, mid, end);
        }
        
        if(dictionary[mid].charAt(i) > givenWord.charAt(i))
        {
            
            return binarySearch(i,start, mid);
        }
        else if(dictionary[mid].charAt(i) < givenWord.charAt(i))
        {
            return binarySearch(i,mid, end);
        }
        else if(dictionary[mid].charAt(i) == givenWord.charAt(i))
        {
            if(i == givenWord.length()-1)
                return mid;
            return binarySearch(++i, start, end);
        }
        
        return 0;
    }

- woo February 27, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Binary search

- YIJIN February 28, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

The Trie is a better implementation for dictionary, and the search complexity will be O(n).

But the space complexity is bit of a concern if it is in term of billion of words.

Prefix tree could be another way that could provide you with O(nlogn) complexity.

- kirankumarcelestial February 28, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

1. Binay search will give O(log n) soln:
maxidx = maximum index of the dictionary
W: Word whose idx is to be found

string W1 = getWord(maxIdx/2);
int temp = strncmp(W1, W);
if (temp > 0){
	//W1 > W. search in upper half
        findIdx(maxidx/2, W);
}
else if (temp < 0){
	//find in lower half
	findIdx((1.5*maxidx), W);
}
else{
	//word matched, return this idx
	return maxidx/2;
}

- puneet.sohi March 02, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

2. Trie could be a ok solution, but
a) Its not mentioned dictionary is implemented as a trie
b) Lookup is O(n) which is > O(log n) for binary search

- puneet.sohi March 02, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Using the dictionary data structure of c#

Dictionary<int, string> words ;
public int getIndex(string inPut)
       {
           foreach (var k in words )
           {
               string s = words[k.Key].ToUpper().Trim ();
               if ( s== inPut.Trim ())
               {
                   return k.Key;
               }
           }

           return -1;
       }

- perfect March 10, 2014 | Flag Reply

CareerCup

Amazon Interview Question for SDE1s

Books

Videos

Resume Review

Mock Interviews