Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
that's an ideal answer to come up with but I would like to know if Google was happy with this answer assuming this approach was presented.
Would not the time complexity actually be O(m+n) since it takes time O(1) to put each of the first bag's m=1000 elements into a hash table?
There are two different situations in unspecified condition:
Whether we should consider duplicate one or more times.
- If the expected result without duplicates:
1. HashSet, not HashTable. Its similar but not identical structures.
2. We should remove elements from HashSet after they was founded - otherwise we can find them a few times.
- If the expected result with counted-duplicates
1. HashTable should save as values duplicates-count
Doesn't it depend on what kind of a data structure is used to store the million elements, so accordingly the search could be carried out?
Sedgewick says:
Bags. A bag is a collection where removing items is not supported—its purpose is to
provide clients with the ability to collect items and then to iterate through the collected
items (the client can also test if a bag is empty and find its number of items). The order
of iteration is unspecified and should be immaterial to the client.
So in this case, we can use the function retainAll(java.util.Collection c) which removes all elements which is not in collection c and retains only the elements that are present in the first bag. So, now iterating through the second bag gives the common elements. Please go let me know if this approach would be appropriate...
1. We could use bloom filters for the first bag of elements (1000 elements). Since the number of elements is small it is easy to use bloom filters in this case. For all the elements in Bag2 pass these elements through bloom filter, if they are already set then print the solution that the element is identicaly. In bloom filter if a number is not present => the index of that element will never be set. So bloom filters could be used here.
2, If the range of numbers(in bag1) is known and is very small, we could as well use a BitSet and set the corresponding bit in bitset for all the 1000 elements. In second pass we could use see for each element if a bit is already set in the bitset, if Yes then we print that the element is identical. advantage of bitset is it is much more memory efficient.
If hashing is not universal (i.e. collisions are possible), then sort m (=1000) and sort n(=1M)
and do compare merge sort way.
Complexity O(m log m + n log n + m +n ) = O(m log m +n log n)
If m < n, you can simplify your comp. to O(n*lgn )
But note, question just said elements in a bag. We can't assume the elements are (total order) comparable.
If we can, here is another way via sorting:
1) Sort the smaller bag O(m*lgm)
2) Binary search each element of larger bag in result of 1) above:
O(n*lgm)
Total comp: O( mlgm + nlgm ) = O(n*lgm)
First put all the 1000 elements in an hash table. Then take each of the one million elements and look for them in the hash table. If match is found report them as a solution.
- Hingle McCringleBerry September 27, 2013Also works if you don't have enough memory to bring all the 1mil elements. Then just drag a chunk to disk, check and then drag the next chunk.
Space Complexity : O(m)
Time Complexity : O(n)
m size of smaller bag n: size of larger bag