Bloomberg LP Interview Question
Software Engineer / DevelopersCountry: United States
it depends on situation.......there is no exact answer as what i reckon...
if your data is distributed equally in such a way that very few or little collision then take size >1000 which is prime number...........
so performance of searching and insertion get enhanced.....
if your data has lot of collision than go for smaller table size...and use chaining to chain the records or make bucket......
hope this helps......
bool isPrime(int n) {
//if n is prime, then return true, otherwise return false
}
int HashSize(int item_num)
{
int hashSize;
for(hashSize = 2 * item_num + 10; !isPrime(hashSize); ++hashSize)
;
return hashSize;
}
Seems the answer should be smaller than the number of items; otherwise, extra space will be allocated and never used (such waste is never totally avoidable, but should be minimized).
On the contrary, you should have more slots than items. Some items' hashes will collide even with the best hashes. You want to avoid collisions as much as possible. I'd say, something like the smallest prime greater than 1.5 x |items|.
there's a couple questions you want to ask in this case: if memory efficiency is an issue, if you need results in real time, and whether the data has a lot of duplicates.
if memory efficiency isn't an issue, then you should make the hash table much more than 1000 entries to avoid collisions right? ( I guess it depends on how you're hashing your data and the likelihood of there being collisions... )
if you don't need data in real time, then buckets can be implemented with linked list to deal with collision.
if you need data in real time, and there is a lot of duplicates, then would it be better to implement buckets with BST to improve the access time? Not sure if the complexity would be worth it if you're only dealing with 1000 items...
@Leo,
Not correct. The proper answer should be taking into account the load factor (call it "a") so the lookup time should be constant on an average which means that the n = THETA(m) where m = # of slots. But then again, the constant hidden in the THETA is determined by memory constraints. The real objective of the interviewer would be (I reckon) that the interviewee understands this.
The answer is simple.
You have 1000 items and you want to access them as fast as possible. Nothing is said that other items will be ever added. So of course you need 1000 slots!
Memory couldn't be an issue here, just 1000 items! (Well, if you are programming a washing machine with very small memory... but computers in Bloomberg have enough memory to hold hash tables with 1000 slots).
there's a couple questions you want to ask in this case: if memory efficiency is an issue, if you need results in real time, and whether the data has a lot of duplicates.
- Anonymous February 02, 2012if memory efficiency isn't an issue, then you should make the hash table much more than 1000 entries to avoid collisions right? ( I guess it depends on how you're hashing your data and the likelihood of there being collisions... )
if you don't need data in real time, then buckets can be implemented with linked list to deal with collision.
if you need data in real time, and there is a lot of duplicates, then would it be better to implement buckets with BST to improve the access time? Not sure if the complexity would be worth it if you're only dealing with 1000 items...