Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
I would start with asking questions to see what information is available to narrow the search.
1. Is there a city, state or zip code saved for customers
2. Can we assume that their area code is correct in correlation with their location
(i.e within which # range can we check their area code? to their zip code, city or state)
3. We have accesses to a list of the valid area codes for the location rage.
For my solution these have to be true, which is completely realistic for a shopping site. Further more if we are only looking at lan line number, and not a cell, we most likely won't have the issue of number 2.
From here you would do database queries to pull all customer ids and numbers from each zip code, city or state. Next you cycle through all the customers area codes until one comes up does not match the list of valid area codes given by assumption 3.
If you do not have access to 3, then for each location query (zip, city, or state) you just have to keep a hash table where the area code is the key and the value is how many customers had it for that area. Then at the end you just find the key that has a value of 1, meaning that only one customer in the that location had that area code.
but there is 1 millions files so i don't think hash will help .. how we will fetch these files into memory. and what datstructure we should use for this.what would be size of DS?
The assumption here is that by breaking down the search area we aren't dealing with all the files at once. Doing a hash map for each (state, city or zipcode), wouldn't be large at all. There are only 25 area codes for California, and only 21 for Texas. All you are doing is cycling through DB entries and checking the number. So you do a query like:
"SELECT id, number
FROM millionFiles
WHERE state = {$state}";
Then cycle through the entries given back by the query, by using mysql_fetcharray() in php.
So you would basically have 2 nested for loops where the top would cycle through countries, the next through the states of that country. And for each query you ran for the state, you would cycle through the results and populate the hash. Then when the last person is checked from the results, and added to the hash, check the hash for a value of 1. If it exists, that is the possible area code that is wrong.
If you are doing a global search where you want to test all numbers for every country, regardless of the first singleton area code was found. Then I would create a data structure for saving the id, and area code for each time the value of 1 is found in the hash, after the state area code loop. You are creating a hash each time a new state is looped through, then discarding it once the loop is finished.
On the other hand instead of just adding all area codes to the hash and incrementing, you could just check to see if each individual in the result had an area code that was in the list of valid area codes for that state. As soon as an area code is hit that does not match up to the valid area codes of that state, you have your match.
The question states _files_. No mention of indexes created etc. So it seems more like a question of scripting, like perhaps using grep...
- Anonymous May 02, 2012