Amazon Interview Question for Software Engineer / Developers


Country: United States




Comment hidden because of low score. Click to expand.
7
of 7 vote

1. encode each possible sequence of pages in a number. If we have K possible page types, our number will look like nnn where 0<n<K. We can treat this number as base K. For example, if we had only 3 types of pages (A, B and C), we could assign A->0, B->1, C->2. So our base would be 3, and B->A->C transition would be encoded as 1*3^2+0*3^1+2*3^0=102(base3) or b(hex).

2. create an array of ints of KKK size, initialize all with 0. So if we have "several dozen" of page types (several means 4 for example), we would need (12*4)^3 array elements (~110K).
2'. Or if we don't want to allocate memory for all possible sequences, have a set <sequence id> -> <number of times>

3. sequentially scan the log keeping track and updating KKK state per user (i.e. keep it in a set <user id>->kkk). Increment corresponding cell in array (or set) created at [2].

4. select greatest value from the array (or set), decode it's position (id) into page sequence

O(n)

- chatbot March 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

You could download the sample log file at:

db.tt / LobF70Rt

- jimmy514in March 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

put the data in a hash with key as customer id. the value for the hash (chained) since there will be multiple entries will be the pages he visited
Let say I number pages
Homepage = 1
Shopping cart = 2
Listmania = 3

then for customer with id let say 1130 hash will have an entry like 1->4->3->2
after creating a hash scan all the customers and build a max heap for the count of seq
like for 1->4->3->2 seq will be
1->4->3 i.e 143
4->3->2 i.e 432

- DashDash March 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

// create arraylist to handle page trios

ArrayList<Trio> mostVisited = new ArrayList<Trio>();

class Trio {
	int counter = 1;
	int p1, p2, p3; 		// respective pages visited

	setP1(int uniqueID)			// copy for setP2 & setP3
	{
		p1 = uniqueID;
	}
}

// after a trio of pages has been visited and their uniqueID set to p1, p2, p3
// check if that trio exists within arraylist

if(mostVisited.contains(t))	// if it exists increment
{
	mostVisited.get(t).counter++;
}
else						// or add the trio the arraylist
{
	mostVisited.add(t);
}

// then . . . 

// to query the most visited site you would use a for loop or 0(1)

int mostVis = 0;
for(int i = 0; i < mostVisited.size(); i++)
{
	if(mostVisited.get(i).counter > mostVis)
	{	
		mostVis = i;
	}
}


// finally get the Trio from the mostVisited arraylist at the index mostVis

Trio finalTrio = mostVisited.get(mosVis);

- Chris Sullivan March 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Sorry the for loop in the middle would be 0(n) *'-'*

- Chris Sullivan March 19, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Maintain a map user_map[key]
key = userid
user_map[key] = page sequence queue of max size 3

Additionally maintain a map visit_count_map[key]
key = page sequence queue of size 3
visit_count_map[key] = visit count for page sequence queue denoted by key

For each line in log file:
	user = userid
	page = pageid
	
	if user_map contains user:
		if user_map[user].size() < 3:
			user_map[user].push(page)
		else:
			user_map(user).pop()
			user_map(user).push(page)
		
		if( user_map[user].size() == 3:
			visit_count_map[ user_map[user] ]++;
	else
		user_map[user].push(page)

print the key with maximum value in visit_count_map[]

Hash table can be used instead of map.

Time Complexity : O[ (no of lines in log file)*log(n)*log(3*k^3) + k^3 ] for map
O[ (no of lines in log file)*log(3*k^3) + 3*k^3 ] for perfect hash table

Space Complexity : O(3*k^3) + O(3*n)

- Cerberuz March 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Any java implementation for this ?

- BV October 24, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

My python code according to charbot:

input = [ {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "ttads", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "ttads", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "ttads", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Product Details"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"},
          {"user_id" : "tttad", "page-type-id":"Home Page"}
          ];

seq_rank={};
user_set={};
page_type_inv=["Home Page", "Product Details"];
page_type={};
index=0;
page_type_size=0;
for index in range(len(page_type_inv)):
  page_type[page_type_inv[index]]=index;
  page_type_size=index;
page_type_size+=1;
print page_type;
#page_type={"Home Page":1,
#           "Product Details": 2,
#           "type size !!!": 3};
print page_type_size;

for item in input:
  if item["user_id"] in user_set.keys():
    user_set[item["user_id"]]["page_type_fifo"]%=page_type_size*page_type_size;
    user_set[item["user_id"]]["page_type_fifo"]*=page_type_size;
    user_set[item["user_id"]]["page_type_fifo"]+=page_type[item["page-type-id"]];
    user_set[item["user_id"]]["count"]+=1;
    if user_set[item["user_id"]]["count"] > 2 :
      if user_set[item["user_id"]]["page_type_fifo"] in seq_rank.keys():
        seq_rank[user_set[item["user_id"]]["page_type_fifo"]]+=1;
      else:
        seq_rank[user_set[item["user_id"]]["page_type_fifo"]]=1;
  else:
    user_set[item["user_id"]]={};
    user_set[item["user_id"]]["page_type_fifo"]=page_type[item["page-type-id"]];
    user_set[item["user_id"]]["count"]=1;
print seq_rank;
rank_max=0;
rank_max_key=0;
for key in seq_rank.keys():
  if rank_max < seq_rank[key] :
    rank_max=seq_rank[key];
    rank_max_key=key;
third_page_type=page_type_inv[rank_max_key%page_type_size];
rank_max_key/=page_type_size;
second_page_type=page_type_inv[rank_max_key%page_type_size];
first_page_type=page_type_inv[rank_max_key/page_type_size];

print "Max Order Seq is:", first_page_type, "==> ", second_page_type, "==> ", third_page_type;

print input;

- chenlc626 March 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

First solution:

Keep a hash of the most recent pages visited by each customer:

customer_name -> [page_name, page_name, page_name]

Keep a hash of page-tuples and their frequency:

(page_name, page_name, page_name) -> count

Refinement:

Build ids for customers and pages on the fly with two hashes and two integers:

customer_name -> customer_id (w/max_customer_id)
  page_name -> page_id (w/max_page_id)

Since it's the same pattern for both, you can encapsulate the id generation with a class that puts a light wrapper around a hash, with an API of get_id_for_name(name) -> id and get_name_for_id(id) -> name.

Then your other data structures become more compact (and probably faster, depending on the hash implementation):

customer_id -> [page_id, page_id, page_id]
  (page_id, page_id, page_id) -> count

For each line of the file, extract the customer name and page name, and then get their ids. Then update the customer_id's pages, truncating the array as needed. Then update the hash of counts.

At the end sort the values of the count hash to get the most popular page sequence, and then convert the ids back to full names for the final output.

- showell30@yahoo.com March 15, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

P.S. If you use the id approach, then you'll be tempted to maintain a second data structure for reverse lookups, but it's not worth it if you're only ever gonna do one reverse lookup. If the problem statement evolves and you're gonna want to do multiple reverse lookups, then you can maintain a second data structure for the eventual reverse lookups, and since ids are sequential, you'd prefer a vector to a hash.

- showell30@yahoo.com March 15, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Ok my answer goes like this
maintain a hash of users and pages visited

hash<userid,arraylist<visitsequence>>
now lets say user 234 has visited the sequence 1-6-5-3

now we will be adding the pages in the array list as the pages are being visited when the
visitsequence size becomes equal to 3 we take this visit sequence as key and put it in
another hashmap<visitsequence,count> then remove the first entry (by doing visitsequence.remove(0)) in the original hash map this is to count the other sequences

for ex

user 234 visits 1-5-6-7

as the sequence becomes 156, we put it in second hashmap and increase the sequence count in secound hashmap when we remove first entry in 156 it will become 56 next when user visits 7 we will have sequence 567..so even this can be put in our second hashmap

- dude March 15, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

isn't this similar to max heap solution given by dashdash

- jigi March 20, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

chude dao!

- Hello world August 09, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

so this is my say on this,

- find the most visited page
- find the next most visited page, condition being that the second page should come next to the 1st page, as the log is chronological.
- repeat the second process for the 3rd page.

Repeat the process for other pages that have the same count but i think it should come up the first time.

- jayant March 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

chude dao

- Hello world August 09, 2013 | Flag


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More