## Amazon Interview Question

Software Engineer / Developers**Country:**United States

This looks like a cool solution. I was trying to work something out using 3 regular expressions, but I think your method is the way to go.

However, it will fail for terms which are anagrams. As an example, if we search for "Dictionary", the word 'Indicators' will be counted as a word which needs only 1 letter changed to get an exact match.

How can we change your technique to account for this?

@Michael - it is supposed to find "indicators" if input is "dictionary"

In the question, it is clearly mentioned that "spelling order is not important."

see my C# implementation below

I see now. In that case, the method which you and The Artist have posted is definitely the correct solution.

There is a small issue in the solution. If you are accepting words with sum = 2, you might end up taking words which are different by 2 letters also.

I think your algorithm does not consider "The list is already sorted alphabetically."

Sorted list must benefit your algorithm.

@Anjana: sum=2 should be considered when the size of the dictionary word is same as the given string

@Eric: that could be a decoy. I did not find much help from the sorted list.

Couldn't get the part where sum=2...i mean if we find exact match, sum=0, differ by 1 then sum=1. How does sum=2 fit in. Please explain.

This might be a little dumb question but nonetheless, would be great if you would explain.

@Jas: consider the example given in the question where you convert 'coverage' to 'converge'. In that case the sum will be two. Basically here you are taking one character of the given string and replacing it with any other alphabet.

Step 4 is wrong.

Suppose we have words "a" and "b".

The arrays would be a[][]=[1][0] and b[][]=[0][1].

The result would be sum=0 which means that "a" and "b" are completely equals.

I am not sure why this question would be asked(as you have to traverse the whole dictionary of million words). The better question to ask would be how would you design the dictionary so that you can print the words efficiently.

The answer to that would be :

Have your dictionary stored in key , value pair : key : count[26]array, value list of all strings that match (basically all anagrams will be in this string)

Now given word with count array G[]

There are the following three cases :

1) anagram of Given(G) + An extra alphabet

2) anagram - 1 alpha

3) change one alphabet in G

Now solution for above 3 cases :

```
1) for i = 0 to 25 : {G[i]++ ; printlist(hash(G)); G[i]--; }
2) Take old G, not from 1) . for i = 0 to 25 : {if(G[i]) {G[i]-- ; printlist(hash(G)); G[i]++; }
3) for i=0 to 25
{
G[]= given;
if(G[i])
{
G[i]--;
for j=0 to 25 -> skip j=i;
{
G[j]++;
printlist(hash(G))
G[j]--;
}
G[i]++;
}
```

public List<string> FindWord(List<string> dictionary, string word)

{

List<string> output = new List<string>();

int[] charCount = new int[26];

char[] input = word.ToCharArray();

for (int i = 0; i < input.Length; i++)

{

charCount[input[i] - 'a']++;

}

int[] temp = new int[26];

foreach (string _word in dictionary)

{

if (Math.Abs(_word.Length - input.Length) == 1 || Math.Abs(_word.Length - input.Length) == 0)

{

charCount.CopyTo(temp, 0);

char[] _input = _word.ToCharArray();

for (int i = 0; i < _input.Length; i++)

{

temp[_input[i] - 'a']--;

}

int sum = 0;

for (int i = 0; i < temp.Length; i++)

{

sum += Math.Abs(temp[i]);

}

if (sum <= 2)

{

output.Add(_word);

}

}

}

return output;

}

How about a backtracking solution? Lets say we build a trie of the one million words. lets say input word is of length n. We check all the branches and keep track of number of mismatches and if mismatch count is less than permitted we allow traversing the trie else we exit the branch and check another branch using backtracking.

This approach however checks for words with only same length. I am checking how this can be modified to words with different length.

```
backtracking(root, "coverage", 1);
backtracking(trie node,string input_word, int mismatch_allowed)
{
if (mismatch_allowed < 0)
return;
if (input_word.length == 0)
{
print current node details (i.e. complete word till this node);
return;
}
for character_branch in (all_branches_at_root)
{
if (character_branch == input_word[0])
{
// first char matches, check next chars
backtracking(character_branch_node, "overage", mismatch_allowed);
}
else {
// character mismatch reduce the count
backtracking(character_branch_node, "overage", --mismatch_allowed);
}
}
}
```

I think the key to solving this problem is to use the information that the list of words is sorted (i.e.: you already have the word list preprocessed to help you with the query).

Consider (for complexity computation):

N - number of words in the word list (max. 1 million)

L - size of a word (max. 40)

A - size of the alphabet (= 26)

For 1 letter distance you can use the following algorithm:

1. Generate all the possible words that are 1 letter distance away from the query word

- this has the complexity O(L*A)

2. Look up each of the generated words in the word list using a binary search:

- the look up of an individual word is O(L*logN)

- we have O(L*A) lookups => final complexity is O(L^2*A*logN) which is roughly (not considering the hidden constant) about 832.000 operations which is better than O(L*N) which is roughly 40 million operations.

For distance = 2, this algorithm performs worse than the O(L*N) version.

i think you missed the point in the question where it says that the order does not match which can result upto a max of (40!)/(2^14) possible combinations which is humungous.

@anjana

In the algo presented by The Artist

a minor modification is needed such that it doesnt take into account of the words such as (sumit with sumitas)

what we can do is make the changes in that space i.e O(26)

for example the word sumit has to be searched

so s-1,u-1,m-1,i-1,t-1

when the word sumita is encountered all the fields in O(26) is zero except a which is -1,

so this is our word.

now if we take sumi , then only value corresponding to field t is -1.

but if we take a word such as samit then field corresponding to u would be 1 and a would be -1,

in a iteration corresponding to the word length there can be only a pair of 1 & -1;

we wont consider if there is 1&1 or -1 & -1 which might arise in the example of (umi or sumitaf) which would be the case for difference in length corresponding to 2.

We can solve the problem by the approach artist suggested. But we can reduce the time complexity from O(mn) to O((number of words with size+number of words with size-1+number of words with+1))m.

Simply apply the binary search on the list given to find the first occurance of the word with size-1, It would take O(logn)(i.e. log one million= approximately 20). From that index onwards you can traverse the list to until you reach a word with size+2. Since the list is already sorted, We don't need to traverse the entire list.

Couldn't get the part where sum=2...i mean if we find exact match, sum=0, differ by 1 then sum=1. How does sum=2 fit in. Please explain.

This might be a little dumb question but nonetheless, would be great if you would explain.

here's a O(n*m) (n=no of words in the dictionary, m = max length of the word)complex solution with space O(52). Perform the following steps:

- The Artist November 12, 20121. Take all the words from the lis with length same as the given word or +/- 1

2. Create a letter count array/map of the given word (for eg if the word is "coverage" then a=1, c=1, e=2, g=1, r=1, v=1) (say a[26] contains the letter count from a to z)

3. take each word from step 1. create its letter count array (say b[26]

4 for i = 1 to 26 sum += abs(a[i]-b[i]). each word with sum value = 1 or 2 will be the required word.