Google Interview Question
Software Engineer / DevelopersCountry: United States
Right. Depends on the scenario.
Is it storing for ftp transfer to 100 other machines?
Will you run queries on that later? etc etc
Yes, really depends on what you want to do with these and if there are any patterns.
If the numbers tend to be assigned in batches you could just the numbers where the batches begin and end.
ie.
//until 416 200 0999 are unused
// 416 200 1000 to
// 416 300 0999 are in use
phone[0] = 416 200 1000;
phone[1] = 416 300 0999;
phone[2] = <next number in use>
phone[3] = <next number not in use>
If space is not at a premium just assign a boolean to each element indexed by the phone number.
i.e. as many indices as the largest phone number..
(you can still reduce by /8 by using every bit per byte)
The key here is that 1 million phone numbers will be within some range, likely 10 million or so. 10 million bits = 10^7 bits ~ 0.12 GB. Just have a bit array where the first bit being set implies that the first phone number is set (e.g., 10 000 000) and the last bit indicates the last phone number (10 999 999). If you find a number in the list, just set the appropriate bit. You now have a bit vector of size 1million bits, sorted, and to check for a particular number, you just check whether the corresponding bit is set.
This is not the best solution. The phone numbers are typically 13 digits in size, should mean a vector of size 10^13. You can not just create a bit vector of same size as number of phone numbers, you can not map it back to corresponding number.
Secondly, I assume interviewer is expecting more of STORAGE - as in saving it in database or file (XML file or regular file).
Am I missing something? 10^7 bits should be roughly 1.2*10^6 bytes, which is 1.2M bytes, not 0.12GB(120MB).
We can use suffix tree (or) trie. Given the fact that many of the phone numbers will have the same prefix. The height of suffix tree would be 10 (+1 empty root node).
- Mario September 04, 2012