Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
Create a hash
where multiple locations point to a object {string, count}
so hash(3), hash(4), hash(10), hash (2) points to {str -> aa, count -> 0}
so hash(3), hash(9), hash(14), .. points to {str ->bb, count -> 0}
### here 3 is pointing to both aa, and bb
maxCount = 0;
maxStr = '';
for each num in input
foreach obj in map.get(num) ### map.get(3) will return aa, bb, .., etc
obj.count++;
if (obj.count > maxCount)
maxCount = obj.count
maxStr = obj.str
end if
end for
end for
return {maxCount, maxStr}
1) Create a Map with key as number and value as HashSet of string.
2) Given a number, look up the value in map and iterate through its strings creating another map with key as string this time and value as count of each time its seen in a number's set.
3) Once you have repeated step 2 for all input numbers you will have a count of how many times each string was present in the number's string sets. Run through this map and find the string with highest count.
Sort of a brute-force approach, optimized a bit by sorting and leapfrogging. I'm sure someone will do better. I was thinking of using a bitmap but couldn't work out how to make it more efficient.
input = sort(input)
while (get(line)) {
data[line[0]] = sort(line[1-end])
}
winner = list[]
win_matches = 0
foreach tag in data {
comp = data[tag]
matches = 0
index = 0
len = length(data[tag])
foreach elem in input {
while (index < len) {
if (comp[index] > elem) {
break
} else if (comp[index] == elem) {
matches ++
// break /* uncomment to count duplicate number matches */
}
index++
}
}
if (matches > win_matches) {
winner = list(tag)
win_matches = matches
} else if (matches == win_matches) {
winner = append(winner, tag)
}
}
print (rand(winner))
1. For each line in input file create a hashset (of integers) i.e a vector of hashsets.
2. For each integer in input string:
i) check each of the hash_set. If all integer found in a single hash_set , print it. Otherwise keep a count of max_count and hash_set index. Print at the end.
m->no of line in input file
n->max no of integer in a single line
p->no of integer in input
Time complexity: O(m*n + p*m) {m*n=for creating hash_set,p*m for checking}
Space complexity: O(m*n)
Use a trie. The entire file can be preprocessed in O(n) time where n is the number of integers in the file. Once this is done, you can find the line matching the most in O(m) where m is the number of integers in the input.
To be independent of the ordering of input integers, when creating the trie, have it created for all the permutations of integers in the line
For example for the first line 3,4,10 - create entries in trie for
3,4,10
4,3,10
10,4,3
10,3,4
4,10,3
3,10,4
Use a trie. The entire file can be preprocessed in O(n) time where n is the number of integers in the file. Once this is done, you can find the line matching the most in O(m) where m is the number of integers in the input.
To be independent of the ordering of input integers, when creating the trie, have it created for all the permutations of integers in the line
For example for the first line 3,4,10 - create entries in trie for
3,4,10
4,3,10
10,4,3
10,3,4
4,10,3
3,10,4
The above problem becomes similar for finding the longest common subsequence between two array of integers...
The extension would be like longest common subsequence between a constant array (input) and a list of array integers (from file). The array from the list of array integers having the longest subsequence is our answer.
This approach will require negligible memory and is almost an inplace search algorithm. The complexity of the algorithm would be O(mn1+mn2+mn3...) where m is the length of the input string and n(i) is the length of the array of ith line integers.
This approach is much more easier to implement as compared to hasmap solution and has almost O(1) space complexity as compared to hashmap solution whose worst case space complexity when all the numbers in the list are distinct is O(n1+n2+n3...)
Does anyone have a better solution that O(n), as I can see it can be done easily in this time
Here is the algo
1) Read line by line
2) Extract the number from the line and match with the numbers given,
Take a max which contains the string and the number of matches
Everytime you compares for a match and if matches are greater than max then make max = current string
public class Program2 {
public static String currentPoint;
static Map<String, Set<String>> dic = new HashMap<String, Set<String>>();
public static void main(String[] args) throws IOException {
String fileName = "input.txt";
BufferedReader reader = new BufferedReader(new FileReader(fileName));
for (String line; (line = reader.readLine()) != null;)
{
TreeSet set = new TreeSet();
Collections.addAll(set, line.split(" "));
dic.put((String)set.pollLast(), set);
}
System.out.println(getKey(Arrays.asList(new String[]{"3","4","10"}) ));
System.out.println(getKey(Arrays.asList(new String[]{"12", "3", "4"})));
System.out.println(getKey(Arrays.asList(new String[]{"3", "9"})));
System.out.println(getKey(Arrays.asList(new String[]{"3", "9"})));
System.out.println(getKey(Arrays.asList(new String[]{"3", "4", "12"})));
}
public static String getKey(Collection<String> nums){
for (Entry<String, Set<String>> entry : dic.entrySet()) {
if(entry.getValue().containsAll(nums)) { return entry.getKey();}
}
return "Nil";
}
}
Since the file contains several thousand lines/numbers, it would be more efficient to preprocess the input in O(n) time, storing the numbers in Hashset. E.g. 3, 4, 10 will be stored in Hashset.
- Anonymous April 05, 2012Now as you read the file sequentially you see if the current number in the current string is contained in the Hashset. If yes increment count for that string.
At end of the line you compare this count with Max Count and store count as Max Count if it is greater and also store the corresponding string.
Thus only one pass through the file will be required. Complexity will be O(m + n).