ly | CareerCup

ly

Comments (7)

Reputation 0

Page:

Comment hidden because of low score. Click to expand.

of 0 vote

search qps = 1b dau * 2 search per day/86400s = 20k/s
peak is 40k/s
One counter server can keep the sum in memory.
All web servers informs the counter when a search request is received.
A second counter does the same thing to prevent single point failure.
Slave reports to master if it hits 1b.

- ly October 09, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

What constitutes a quote?
A fragment text longer than some threshold, say 20 words

How do you find quotes?
Cluster all the web pages. For a web page, compare with all other pages in the cluster. O(mn).
Store quotes in db. Key is url, value is a list of position and url where it is reproduced.

How do you make it scale to the web?
There are 1t web pages over the internet, store all of them in clusters.

How do you handle updates?
Periodically crawl all the web pages and update the quotes. Given a url, check if the quoted positions are updated. If it is updated, then recompute the quotes.

How would you arrange the servers?
Two database, one to store all the clustered web pages, the other stores the quotes.

- ly October 08, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Scenario: user posts status, user receives news feed from friends

Service: news feed, friendship
When user posts status, write to db.
When user requests news feed, get the top 100 status from each friends, merge and return top 100

Storage:
status nosql row key user id, column key time stamp, value status
friend nosql row key user id, column key tie stamp, value friend id
disk = 1b dau * 0.5 post/day * 100byte = 50G/day

Scale
read qps = 1b * 5 read per day /86400 = 100k
peak = 200k, growth = 400k
write qps = 1b * 0.5/86400 = 10k
peak = 20k, growth = 40k
Turn on caching since qps is high.
memory = 1b*100*100byte = 10T.
So we need at least 100 100GB ram database server

- ly October 07, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Solution 1 O(n)
10 trillion 32 bit integers
Each machine creates an integer array of size 2^16 to count the frequency of each 2^16 interval. The array only takes 2^16*4 = 256 k memory. Total size is 2.5G.
One master machine creates a 64 bit integer array of size 2^16 to merge the counts.
Master knows the ith number in the jth bucket is the median.
In the second pass, each machine count the frequency of each number in the jth bucket and pass to master.
Master figures out the ith number is the median. If 2.5G data transfer between servers is the bottle neck, then we can do a three level partition, 2^11, 2^11, and 2^10.

Solution 2 map reduce
The merge step of can also be done in parallel which hints for map reduce.
Use map reduce to count the frequencies of each number.
Output contains a max of 4B integers, frequency pairs sorted by integer.
We can scan the file and find the median.
If scanning the output file becomes the bottle neck, we can do a multi-level map reduce similar to solution 1.

- ly September 26, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

message service->channel service<->push service<->client
                               friendship service

channel service stores if a user is online in memory.
When a user visits facebook homepage, message services returns a push server address to user and tells channel services and the user is subscribed for push. User then connects to the push server by socket.
When a user closes facebook, socket is disconnected. Push service tells channel service to unsubscribe the user.
When a comment is posted, message service send the comment to channel service. Channel service gets the thread followers and forward the comments only to those online through push service.

Push servers are sharded by userid with consistent hash

- ly September 21, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This looks like the best algorithm.
1. no parent pointer
2. not BST
3. No storage overhead
4. complexity O(n)

- ly March 28, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

The program takes user inputs and store it in n. It will output 18 only if the user inputs 12.

- ly February 26, 2011 | Flag Reply

Page:

CareerCup

ly

Books

Videos

Resume Review

Mock Interviews