Amazon Interview Question
Software Engineer / DevelopersCountry: United States
put the data in a hash with key as customer id. the value for the hash (chained) since there will be multiple entries will be the pages he visited
Let say I number pages
Homepage = 1
Shopping cart = 2
Listmania = 3
then for customer with id let say 1130 hash will have an entry like 1->4->3->2
after creating a hash scan all the customers and build a max heap for the count of seq
like for 1->4->3->2 seq will be
1->4->3 i.e 143
4->3->2 i.e 432
// create arraylist to handle page trios
ArrayList<Trio> mostVisited = new ArrayList<Trio>();
class Trio {
int counter = 1;
int p1, p2, p3; // respective pages visited
setP1(int uniqueID) // copy for setP2 & setP3
{
p1 = uniqueID;
}
}
// after a trio of pages has been visited and their uniqueID set to p1, p2, p3
// check if that trio exists within arraylist
if(mostVisited.contains(t)) // if it exists increment
{
mostVisited.get(t).counter++;
}
else // or add the trio the arraylist
{
mostVisited.add(t);
}
// then . . .
// to query the most visited site you would use a for loop or 0(1)
int mostVis = 0;
for(int i = 0; i < mostVisited.size(); i++)
{
if(mostVisited.get(i).counter > mostVis)
{
mostVis = i;
}
}
// finally get the Trio from the mostVisited arraylist at the index mostVis
Trio finalTrio = mostVisited.get(mosVis);
Maintain a map user_map[key]
key = userid
user_map[key] = page sequence queue of max size 3
Additionally maintain a map visit_count_map[key]
key = page sequence queue of size 3
visit_count_map[key] = visit count for page sequence queue denoted by key
For each line in log file:
user = userid
page = pageid
if user_map contains user:
if user_map[user].size() < 3:
user_map[user].push(page)
else:
user_map(user).pop()
user_map(user).push(page)
if( user_map[user].size() == 3:
visit_count_map[ user_map[user] ]++;
else
user_map[user].push(page)
print the key with maximum value in visit_count_map[]
Hash table can be used instead of map.
Time Complexity : O[ (no of lines in log file)*log(n)*log(3*k^3) + k^3 ] for map
O[ (no of lines in log file)*log(3*k^3) + 3*k^3 ] for perfect hash table
Space Complexity : O(3*k^3) + O(3*n)
My python code according to charbot:
input = [ {"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "ttads", "page-type-id":"Product Details"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Product Details"},
{"user_id" : "ttads", "page-type-id":"Product Details"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Product Details"},
{"user_id" : "ttads", "page-type-id":"Product Details"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Product Details"},
{"user_id" : "tttad", "page-type-id":"Product Details"},
{"user_id" : "tttad", "page-type-id":"Product Details"},
{"user_id" : "tttad", "page-type-id":"Product Details"},
{"user_id" : "tttad", "page-type-id":"Product Details"},
{"user_id" : "tttad", "page-type-id":"Product Details"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Home Page"},
{"user_id" : "tttad", "page-type-id":"Home Page"}
];
seq_rank={};
user_set={};
page_type_inv=["Home Page", "Product Details"];
page_type={};
index=0;
page_type_size=0;
for index in range(len(page_type_inv)):
page_type[page_type_inv[index]]=index;
page_type_size=index;
page_type_size+=1;
print page_type;
#page_type={"Home Page":1,
# "Product Details": 2,
# "type size !!!": 3};
print page_type_size;
for item in input:
if item["user_id"] in user_set.keys():
user_set[item["user_id"]]["page_type_fifo"]%=page_type_size*page_type_size;
user_set[item["user_id"]]["page_type_fifo"]*=page_type_size;
user_set[item["user_id"]]["page_type_fifo"]+=page_type[item["page-type-id"]];
user_set[item["user_id"]]["count"]+=1;
if user_set[item["user_id"]]["count"] > 2 :
if user_set[item["user_id"]]["page_type_fifo"] in seq_rank.keys():
seq_rank[user_set[item["user_id"]]["page_type_fifo"]]+=1;
else:
seq_rank[user_set[item["user_id"]]["page_type_fifo"]]=1;
else:
user_set[item["user_id"]]={};
user_set[item["user_id"]]["page_type_fifo"]=page_type[item["page-type-id"]];
user_set[item["user_id"]]["count"]=1;
print seq_rank;
rank_max=0;
rank_max_key=0;
for key in seq_rank.keys():
if rank_max < seq_rank[key] :
rank_max=seq_rank[key];
rank_max_key=key;
third_page_type=page_type_inv[rank_max_key%page_type_size];
rank_max_key/=page_type_size;
second_page_type=page_type_inv[rank_max_key%page_type_size];
first_page_type=page_type_inv[rank_max_key/page_type_size];
print "Max Order Seq is:", first_page_type, "==> ", second_page_type, "==> ", third_page_type;
print input;
First solution:
Keep a hash of the most recent pages visited by each customer:
customer_name -> [page_name, page_name, page_name]
Keep a hash of page-tuples and their frequency:
(page_name, page_name, page_name) -> count
Refinement:
Build ids for customers and pages on the fly with two hashes and two integers:
customer_name -> customer_id (w/max_customer_id)
page_name -> page_id (w/max_page_id)
Since it's the same pattern for both, you can encapsulate the id generation with a class that puts a light wrapper around a hash, with an API of get_id_for_name(name) -> id and get_name_for_id(id) -> name.
Then your other data structures become more compact (and probably faster, depending on the hash implementation):
customer_id -> [page_id, page_id, page_id]
(page_id, page_id, page_id) -> count
For each line of the file, extract the customer name and page name, and then get their ids. Then update the customer_id's pages, truncating the array as needed. Then update the hash of counts.
At the end sort the values of the count hash to get the most popular page sequence, and then convert the ids back to full names for the final output.
P.S. If you use the id approach, then you'll be tempted to maintain a second data structure for reverse lookups, but it's not worth it if you're only ever gonna do one reverse lookup. If the problem statement evolves and you're gonna want to do multiple reverse lookups, then you can maintain a second data structure for the eventual reverse lookups, and since ids are sequential, you'd prefer a vector to a hash.
Ok my answer goes like this
maintain a hash of users and pages visited
hash<userid,arraylist<visitsequence>>
now lets say user 234 has visited the sequence 1-6-5-3
now we will be adding the pages in the array list as the pages are being visited when the
visitsequence size becomes equal to 3 we take this visit sequence as key and put it in
another hashmap<visitsequence,count> then remove the first entry (by doing visitsequence.remove(0)) in the original hash map this is to count the other sequences
for ex
user 234 visits 1-5-6-7
as the sequence becomes 156, we put it in second hashmap and increase the sequence count in secound hashmap when we remove first entry in 156 it will become 56 next when user visits 7 we will have sequence 567..so even this can be put in our second hashmap
so this is my say on this,
- find the most visited page
- find the next most visited page, condition being that the second page should come next to the 1st page, as the log is chronological.
- repeat the second process for the 3rd page.
Repeat the process for other pages that have the same count but i think it should come up the first time.
1. encode each possible sequence of pages in a number. If we have K possible page types, our number will look like nnn where 0<n<K. We can treat this number as base K. For example, if we had only 3 types of pages (A, B and C), we could assign A->0, B->1, C->2. So our base would be 3, and B->A->C transition would be encoded as 1*3^2+0*3^1+2*3^0=102(base3) or b(hex).
- chatbot March 14, 20132. create an array of ints of KKK size, initialize all with 0. So if we have "several dozen" of page types (several means 4 for example), we would need (12*4)^3 array elements (~110K).
2'. Or if we don't want to allocate memory for all possible sequences, have a set <sequence id> -> <number of times>
3. sequentially scan the log keeping track and updating KKK state per user (i.e. keep it in a set <user id>->kkk). Increment corresponding cell in array (or set) created at [2].
4. select greatest value from the array (or set), decode it's position (id) into page sequence
O(n)