Apple Interview Question for SDE-3s

Country: United States
Interview Type: In-Person

Comment hidden because of low score. Click to expand.
of 0 vote

II. Someone put distribute Random()*ID in a Hive script to prevent data skew. What would be the problem here?

Problem here is , same id will get different partition number if using Random()*ID and hence will go to different reducers. Aggregation functions based on ID will result in incorrect results.

- Mayank Jain August 20, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
of 0 vote

1. Assuming that Every line in the input data contains user-id and list of product ids.
In the map phase, we will first extract all products purchased by a user and pair them up with the count.
e.g. CUST_123, PROD_1, PROD2, PROD3

result of map phase.

In the reduce phase, we will collect all such results from all users and then add all counts and then return top 100.

- dhruven91 November 13, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
of 0 vote

This is how to answer the second question in old and boring SQL, join a table with itself by user id (so that each product is mapped with each product). Then remove rows with the same product and deduplicate them by filtering higher product id:

    FP.product AS product1, 
    T.product AS product2, 
    COUNT(1) AS bought_count
  FROM Purchases AS FP
  -- the < sign in the join so that we keep only 1 pair of (p1,p2) and (p2,p1)
  INNER JOIN Purchases AS T 
    ON FP.user = T.user AND FP.product < T.product
  GROUP BY FP.product, T.product
  ORDER BY bought_count DESC
  LIMIT 100

Though I have no idea how to do this in Spark. The bottleneck is obviously inner join, but what can we do to optimize it? Maybe the question means distributing the load proportionally among workers, I don't know.

- inthecottonfield February 05, 2019 | Flag Reply

Add a Comment

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.


is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More


CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More