Amazon Interview Question
Software Engineer / DevelopersWell Collision is something that can only be applicable to HashMaps (or any collection like HashTable and other variations). Collision handling techniques are well known in theory - chaining, probing, re-hashing etc. - not sure which tecnhinque Java Collections API uses internally. I dont see how threading can cause collisions in particular - i mean collisoin is caused when the same key gets transformed into the same hashcode - this situation can occur in a single threaded system as well - it is simply caused by the hash funtion used, not how many threads are running the system.
aquila.25,
your correct collisions are caused by dupiclate indexes being made from the same key in the (key, value) pair of the .put() method in the Map collection(which HashMap and HashTable implements). Basically, a collision is the same key different value - same hashcode. In the .put() of the Collections API, java creates an index based on the hashcode() and the key.
This is vulnerable during threading and/or when the Map is getting close to maximum capacity(like for an example there are over 80,000 different indexes(when the Map is getting full it may not pickup the repeated keys))
How threading is vulnerable is becuase you will have the indexes being created concurrently and then when the process is finished the final Map is just combined. Example- you have 1000 records being converted into a HashMap(one record for each key,value pair with the value being the record) and you start a thread to create a new thread every 100 records. if you repeat a key then you will cause a collision. the system will not be able to pick up duplicates due to the threads running concurrently.
You were correct in how to handle this isssue - if you are going to run multi-thread or if the HapshMap.size() is getting close to capacity use chaining or some other technique to afford collisions.
Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
The question is about collisions; not which collection is thread safe. Threading can cause collisions for a specific collection type. So the question is asking: how does threading cause a collision? plus, name another way in a collection could become vulnerable to collisions? which collection type is more vulnerable to collisions? what is a collision? how does the Collection API, deal with making collisions somewhat safe for the specific collection(that is vulnerable to collisions)? Also, i left out, how would you write code to help deal with collisions?
- biggied88 March 03, 2010