## Amazon Interview Question

Software Engineer / DevelopersYou parse the three files and construct a graph of Customer and pages visited.

If a customer say, C1 visits page P1, then there is an edge between C1 and P1.

Also, all the customer's have a count which is initialized to 0, basically a bitmap of 000. Each bit represents a day. So, if atleast one distinct page is referenced on

day n, the nth bit is set.

Before the page is added to the customer's node (as an edge), we need to check

if the page was already present. If not, then depending upon which day's file

is being parsed we OR the customer's count with the day's bit.

For e.g. C1 has accessed P1 and P2 on day 1 and so it's bitmap count is

001 (C1->P1->P2)

On day 2, if we see that C1 accesses P2 and P3,

then count becomes 011.

OTOH, if C1 accesses P2, and P1 again on day2 and accesses unique page

on day3, the count becomes 101.

So, all the customers that have count either 101, 110 or 011 follow the criteria given in the problem statement.

create a hash table such that customer Id is key and page visited is value. the value should hold the set properties. Something on this line Map<customerId, Set<pageId>>.

we will be having three hash tables h1,h2 and h3.

start with h1. for each entry in h1 check their occurrence in h2 and h3.

if entry occurs in exactly one hash table combine both Set and check the size >2. if yes add it into result hash table.

Now , for each entry in h2 check it occurs in h3 and not in result hash table. if entry satisfies this add their Set and check size >2 if yes add to result hash table.

finally your result hash table has required result.

For optimizing this, size of h1 < size of h2 < size of h3

How about combining newcomer's and tito's solutions and using a bitmap in the value of hashmap and keep ORing the value if you find the customerId present in the Map.

not sure why bother so much. I think just create a data structure which holds a set of days and a set of pages. Then map the customer ID to structure.

Then go throw the log and build the map. Once done, scan the whole map and find those customers who satisfy the condition. -- Easy to read, easy to code.

Have a 2D array with row's as customer id and column as page id. For each entry from the log file, increment counter in appropriate cell.

```
if(count(array[i][...] == 2) >= 3 )
return customer id
```

find out all the customers who visited on all 3 days

- Anonymous March 29, 2010Subtract it from the ones who visited only once.