Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
Analyze the data set "To" fields and write down all the combinations derived. This will then be a probability question (within the space derived above). If Jane is added to "To" field, what other combinations have the highest possibility with Jane in "To" field. The probability threshold can be adjusted to give us only a limited set of friends whom Jane should consider adding.
We are interested in P(B|A) - Probability that the user adds B in the "To:" field given "A" is in the "To:" field. Higher is this probability, more appropriate it is for Gmail to suggest the user to add B if A was added.
The architecture is that we store the following -
1. Total number of emails sent by the user - N
2. Total number of emails sent to some user A, i.e. A is in the "To:" Field - E(A)
3. An O(m^2)-edged graph where m is the number of contacts, and there is an edge between any two contacts. Each edge stores the P(B&A), i.e. probability that B and A were in the "To:" field together. This could be in a persistent store (like a SQL DB).
Operations -
1. User sends an email -
- Increments the total mail count, N.
- For each contact in the "To" Field, increment the E(c)
- For each pair of contacts in the "To:" field, calculate the new P(c1&c2) as follows -
P(c1&c2) = (Total number of emails with c1 and c2)/(Total number of emails)
= E(c1 and c2) / N
E(c1 and c2) = P(c1 & c2) * N
P'(c1 & c2) = (E(c1 and c2) + 1)/(N+1)
2. The user is composing an email and types an email address in the "To" field.
For each contact B, calculate P(B|A) = P(B&A)/P(A)
= P(B&A)/ ( E(A)/N )
Sort these probabilities and report top three if they cross a certain threshold (.e.g min 60% probability that B should be included)
This model can be extended to reflect group dynamics. That is, P(B| A and C and D). My statistics fails me beyond the simplistic model I've explained above.
1.Item set mining on data and find frequent sets.
- Curious July 16, 20122.build inverted index of receivers-> sets.
3. when a sender type receivers, find intersection of receivers to suggest receivers from sets.