Amazon Interview Question for Software Engineer / Developers


Team: SDE
Country: India
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
2
of 2 vote

@vran.freelancer
Let me first get it clarified with what your solution talks about. HashCID takes CustomerId as input and gives a hashtable in return. This hashtable is about the PageId being referenced on a particular day.
eg. C1 accesses P1 on D1
C1 accesses P1 on D1
C1 accesses P2 on D1
C2 accesses P1 on D1
C3 accesses P2 on D1

HashCID will look like:
<C1, [<P1,D1>,<P2,D1>]>
<C2, <P1,D1>>
<C3, <P2,D1>>

HashPID will look like
<D1,5>

From the problem, we are expected to find the number of Distinct customers, each of which has accessed atleast 'p' distinct pages in T days.

With your solution, the above requirement is not getting fulfilled. I think the second hashtable should be HashPID (PID, <Day, Count>), otherwise counting access to a page w.r.t to Customer will not be possible.

- Learner February 03, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

what abt the following soln:

first we scan the whole log and extract tuples for the required T days. Next we build up a hash PageID -> list<CustomerID> from this info, ignoring the duplicates. For example, we might get:

P1 -> C1 C2 C3 C5
P2 -> C1 C3 C6 C7
P3 -> C1 C2 C5 C4

then, suppose we need to find all customers who visited at least 2 distinct pages. For that we scan the above hash and build up a new hash CustomerID -> PageID, ie.:

C1 -> P1 P2 P3
C2 -> P1 P3
C3 -> P1 P2
C4 -> P3
C5 -> P1 P3
C6 -> P2
C7 -> P2

finally select those customerID's which have at least 2 pages in the lists

- Anonymous February 09, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I did not understand the need of hash PageID->list<customerID>. I think that hash CustID->list<PageID> alone would do. And how about using a map here ?

- Anonymous February 15, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

With log files, you won't be able to do straight up hashing on ram. You can do it the Hadoop way - with an external sort, and merge.

- memo February 03, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

You might still be able to bring everything into RAM. It depends on how much data you have.

- eugene.yarovoi February 03, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

With 2 Hashtables. 1) HashCID<int CID, HashTable<PID,Day>> and 2) HashPID<Day,Count>

- vran.freelancer February 03, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I understand that the question is about data structures......But I tried to approach it from a DB perspective

Table:

create table Pages(Date int, CustID int, PageID int);

I did not consider the multiple visits to the same page as it does not add any value to the question asked.

Query to get the output

SELECT CustID , Count (DISTINCT PageID) AS NoOfPages FROM Pages WHERE Date BETWEEN D1 AND D2 Group BY CUSTID HAVING NoOfPages > p;

a) All the distinct customers are Grouped
b) Customers and No of Distinct Pages are calculated between selected dates (T Days)
c) Having clause trims the recordset to remove the customers who have visited less than p Pages

Create a temp table with result recordset. And count the number of customers. Output it.

But I am struggling to convert the sql structure into a MT data structure. Is there a data structure which can extend to n-Column instead of Key Pair?

- Anonymous February 04, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

We can create a Map<CID, List<PID>> by taking the logs of T days. Then we can traverse the map and whenever for a CID the size of List<PID> greater than p, output that CID.

- Lalit February 16, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

(Approach assuming we can get the logs into memory completely)
Suppose we have to find the no. of distinct customers visiting pages today. This can be decomposed into HashMap<CustomerID, PageID>

Now if we hash all these maps obtained above based on Dates we get required information as follows: HashMap<Date, HashMap<CustomerID, PageID>> and extract the required info.

- ashu February 18, 2012 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More