Each time a visitor requests a

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

15
Answers
Each time a visitor requests a page from our website, our webserver writes a log entry recoding the visitor's identity and the kind of page requested. Entries are written in chronological order to a plain-text file, with one entry per line. The format of each entry is:

user-id page-type-id

User IDs are arbitrary strings that uniquely represent a given user; if a user visits multiple pages, each log entry will have the same user ID. Page type IDs are arbitrary strings that uniquely represent a given kind of page on our site, such as the homepage, a product detail pages, or the shopping cart. Tons of users visit our website, but there are only a few dozen types of pages.

We can use our weblogs to answer questions about user behavior. One interesting question is: what is the most common three page sequence through the site? E.g., if the most common pattern is to buy items advertised on the home page of the site, we might see the most common three page sequence as "Homepage -> ProductDetailPage -> ShoppingCart". However, if customers spend a lot of time browsing the "Customers who bought this item also bought" feature, we might see the most common three page sequence as "ProductDetailPage -> ProductDetailPage -> ProductDetailPage".

Attached is a sample log file for your reference. Within the first 10 lines of the sample, customer "234" travels through the sequences "Listmania -> ProductDetail -> Checkout" and "ProductDetail -> Checkout -> HomePage" once each.

For the sake of this test feel free to assume that everything will fit in memory. Do keep in mind that given the size of our data sets, performance has to be considered, also, we will be looking at more than just correct output..
- jimmy514in March 14, 2013 in United States | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Algorithm Coding Java

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 7 vote

1. encode each possible sequence of pages in a number. If we have K possible page types, our number will look like nnn where 0<n<K. We can treat this number as base K. For example, if we had only 3 types of pages (A, B and C), we could assign A->0, B->1, C->2. So our base would be 3, and B->A->C transition would be encoded as 1*3^2+0*3^1+2*3^0=102(base3) or b(hex).

2. create an array of ints of KKK size, initialize all with 0. So if we have "several dozen" of page types (several means 4 for example), we would need (12*4)^3 array elements (~110K).
2'. Or if we don't want to allocate memory for all possible sequences, have a set <sequence id> -> <number of times>

3. sequentially scan the log keeping track and updating KKK state per user (i.e. keep it in a set <user id>->kkk). Increment corresponding cell in array (or set) created at [2].

4. select greatest value from the array (or set), decode it's position (id) into page sequence

O(n)

- chatbot March 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

You could download the sample log file at:

db.tt / LobF70Rt

- jimmy514in March 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

put the data in a hash with key as customer id. the value for the hash (chained) since there will be multiple entries will be the pages he visited
Let say I number pages
Homepage = 1
Shopping cart = 2
Listmania = 3

then for customer with id let say 1130 hash will have an entry like 1->4->3->2
after creating a hash scan all the customers and build a max heap for the count of seq
like for 1->4->3->2 seq will be
1->4->3 i.e 143
4->3->2 i.e 432

- DashDash March 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

// create arraylist to handle page trios

ArrayList<Trio> mostVisited = new ArrayList<Trio>();

class Trio {
	int counter = 1;
	int p1, p2, p3; 		// respective pages visited

	setP1(int uniqueID)			// copy for setP2 & setP3
	{
		p1 = uniqueID;
	}
}

// after a trio of pages has been visited and their uniqueID set to p1, p2, p3
// check if that trio exists within arraylist

if(mostVisited.contains(t))	// if it exists increment
{
	mostVisited.get(t).counter++;
}
else						// or add the trio the arraylist
{
	mostVisited.add(t);
}

// then . . . 

// to query the most visited site you would use a for loop or 0(1)

int mostVis = 0;
for(int i = 0; i < mostVisited.size(); i++)
{
	if(mostVisited.get(i).counter > mostVis)
	{	
		mostVis = i;
	}
}


// finally get the Trio from the mostVisited arraylist at the index mostVis

Trio finalTrio = mostVisited.get(mosVis);

- Chris Sullivan March 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Sorry the for loop in the middle would be 0(n) *'-'*

- Chris Sullivan March 19, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Maintain a map user_map[key]
key = userid
user_map[key] = page sequence queue of max size 3

Additionally maintain a map visit_count_map[key]
key = page sequence queue of size 3
visit_count_map[key] = visit count for page sequence queue denoted by key

For each line in log file:
	user = userid
	page = pageid
	
	if user_map contains user:
		if user_map[user].size() < 3:
			user_map[user].push(page)
		else:
			user_map(user).pop()
			user_map(user).push(page)
		
		if( user_map[user].size() == 3:
			visit_count_map[ user_map[user] ]++;
	else
		user_map[user].push(page)

print the key with maximum value in visit_count_map[]

Hash table can be used instead of map.

Time Complexity : O[ (no of lines in log file)*log(n)*log(3*k^3) + k^3 ] for map
O[ (no of lines in log file)*log(3*k^3) + 3*k^3 ] for perfect hash table

Space Complexity : O(3*k^3) + O(3*n)

- Cerberuz March 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Any java implementation for this ?

- BV October 24, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

My python code according to charbot:

input = [ {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "ttads", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "ttads", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "ttads", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"}
          ];

seq_rank={};
user_set={};
page_type_inv=["Home Page", "Product Details"];
page_type={};
index=0;
page_type_size=0;
for index in range(len(page_type_inv)):
  page_type[page_type_inv[index]]=index;
  page_type_size=index;
page_type_size+=1;
print page_type;
#page_type={"Home Page":1,
#           "Product Details": 2,
#           "type size !!!": 3};
print page_type_size;

for item in input:
  if item["user_id"] in user_set.keys():
    user_set[item["user_id"]]["page_type_fifo"]%=page_type_size*page_type_size;
    user_set[item["user_id"]]["page_type_fifo"]*=page_type_size;
    user_set[item["user_id"]]["page_type_fifo"]+=page_type[item["page-type-id"]];
    user_set[item["user_id"]]["count"]+=1;
    if user_set[item["user_id"]]["count"] > 2 :
      if user_set[item["user_id"]]["page_type_fifo"] in seq_rank.keys():
        seq_rank[user_set[item["user_id"]]["page_type_fifo"]]+=1;
      else:
        seq_rank[user_set[item["user_id"]]["page_type_fifo"]]=1;
  else:
    user_set[item["user_id"]]={};
    user_set[item["user_id"]]["page_type_fifo"]=page_type[item["page-type-id"]];
    user_set[item["user_id"]]["count"]=1;
print seq_rank;
rank_max=0;
rank_max_key=0;
for key in seq_rank.keys():
  if rank_max < seq_rank[key] :
    rank_max=seq_rank[key];
    rank_max_key=key;
third_page_type=page_type_inv[rank_max_key%page_type_size];
rank_max_key/=page_type_size;
second_page_type=page_type_inv[rank_max_key%page_type_size];
first_page_type=page_type_inv[rank_max_key/page_type_size];

print "Max Order Seq is:", first_page_type, "==> ", second_page_type, "==> ", third_page_type;

print input;

- chenlc626 March 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

First solution:

Keep a hash of the most recent pages visited by each customer:

customer_name -> [page_name, page_name, page_name]

Keep a hash of page-tuples and their frequency:

(page_name, page_name, page_name) -> count

Refinement:

Build ids for customers and pages on the fly with two hashes and two integers:

customer_name -> customer_id (w/max_customer_id)
  page_name -> page_id (w/max_page_id)

Since it's the same pattern for both, you can encapsulate the id generation with a class that puts a light wrapper around a hash, with an API of get_id_for_name(name) -> id and get_name_for_id(id) -> name.

Then your other data structures become more compact (and probably faster, depending on the hash implementation):

customer_id -> [page_id, page_id, page_id]
  (page_id, page_id, page_id) -> count

For each line of the file, extract the customer name and page name, and then get their ids. Then update the customer_id's pages, truncating the array as needed. Then update the hash of counts.

At the end sort the values of the count hash to get the most popular page sequence, and then convert the ids back to full names for the final output.

- showell30@yahoo.com March 15, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

P.S. If you use the id approach, then you'll be tempted to maintain a second data structure for reverse lookups, but it's not worth it if you're only ever gonna do one reverse lookup. If the problem statement evolves and you're gonna want to do multiple reverse lookups, then you can maintain a second data structure for the eventual reverse lookups, and since ids are sequential, you'd prefer a vector to a hash.

- showell30@yahoo.com March 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Ok my answer goes like this
maintain a hash of users and pages visited

hash<userid,arraylist<visitsequence>>
now lets say user 234 has visited the sequence 1-6-5-3

now we will be adding the pages in the array list as the pages are being visited when the
visitsequence size becomes equal to 3 we take this visit sequence as key and put it in
another hashmap<visitsequence,count> then remove the first entry (by doing visitsequence.remove(0)) in the original hash map this is to count the other sequences

for ex

user 234 visits 1-5-6-7

as the sequence becomes 156, we put it in second hashmap and increase the sequence count in secound hashmap when we remove first entry in 156 it will become 56 next when user visits 7 we will have sequence 567..so even this can be put in our second hashmap

- dude March 15, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

isn't this similar to max heap solution given by dashdash

- jigi March 20, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

chude dao!

- Hello world August 09, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

so this is my say on this,

- find the most visited page
- find the next most visited page, condition being that the second page should come next to the 1st page, as the log is chronological.
- repeat the second process for the 3rd page.

Repeat the process for other pages that have the same count but i think it should come up the first time.

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews