• 0

Country: United States

Comment hidden because of low score. Click to expand.
1
of 1 vote

Taking a shot at this. We need to have a lookup table where they key is userId and value is a list of the websites that user is subscribed to. Let's assume we assign each user a 32-bit(4 byte) int user iD. Let's also assign 4 byte int iDs to the websites. If in the worst case a user is subscribed to all websites, a single key-value pair will take up 16Kb of space we have a total of 16Kb * 10^9 users = 1.6 * 10^12 bytes of data (1.6TB). Assume that a single server can store 8GB of data then 1.6 * 10^12 / 8 * 10^9 = 2 * 10^3 servers ( which is what we have). This allows us to store data on 500K users per server. Each server can store a range of User IDs (ex. server 1 stores IDs from 1...499 999). When a user unsubscribes based on his userId we can navigate to the correct server containing the corresponding look-up table. In the look up table delete the ID of the website he wishes to unsubscribe from.

Comment hidden because of low score. Click to expand.
0
of 0 vote

This is Information retrieval question.

What you probably got to do is create a data structure which will capture the user input (unsubs / subs), and subsequently apply to the existing user data in certain scalable fashion which varies on what basis the data is being used.

Any descriptive question, would love to have a personal Interview then typing

Comment hidden because of low score. Click to expand.
0
of 0 vote

bigtable and hbase

Comment hidden because of low score. Click to expand.
0
of 0 vote

Since partitioning based on website may stress some servers more, we can use a hash to map (user, website) to a particular server. There were can have a list(hash map for efficient searching)per user. Retrieval would be a map reduce job.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Few clarifications needed :

1. When we say performant, are we saying number of users (of a website) can grow much beyond 2k ? or is it more of "Intra-website" performance needed as in website content is to be interacted with, in a faster way ?

2. Scalable means number of websites can potentially increase ? or overall number of users can go beyond 2k ?

Accordingly, one basic design could be to 'deploy' each website on two servers one acting as primary and other secondary, with a 'sync' mechanism in place. Each server(hosting a given site) stores its subscribed user list (above idea of userId is good one i guess).

Pros - Entire system could be fault-tolerant. Every primary is backed by secondary server.
Performance could be addressed using load-balancers installed on both servers, redirecting traffic appropriately.

Cons - If a user is subscribed to all site, his info is stored on all servers, redundant.

Thanks.

Comment hidden because of low score. Click to expand.
-1
of 1 vote

Performant is not a word, unless you mean an actor, someone who performs.

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.