Facebook Interview Question for Software Engineer / Developers


Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
1
of 1 vote

The question states a bot, so it should not a vanilla spider.

1. We should be familiar with the target site to be downloaded. That would help in classifying/distributing the work amongst the bots.
2. There'd be a control bot in the network which distributes the payload to other bots. Any bot can become a control bot incase the current actor goes down.
3. The webpages downloaded with remain on the client. Only metadata on what is downloaded will be sent to the control bot. The subsequent querying can use this to route the request to the correct bot location.
4. The network activity should be covert. So worker bots will operate in small groups to avoid network detection. Also the payload should be to picked from different sections of the website to make it look more normal.

- Kumar March 03, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
Comment hidden because of low score. Click to expand.
1
of 1 vote

Two phase processing.
1st phase : election of a server which gathers IP addresses and # of hash buckets(which is dependent on size of each hash bucket and storage amount of each server) per each server. The elected server sends this information to every server. Or you can manually discover information about servers and distribute it to all servers. This server will take the IP address which should be exposed to clients.
2nd phase:
- The elected server hashes the top url into IP_hash_bucket_number using consistent hashing and send the retrieval request to the mapped server
- Each bot waits for retrieval request
- if a request is received and (the url is not fetched yet or the url is updated), then retrieve the url and parse the returned page and extract linked urls(in this phase, maybe external link can be excluded)
- for each url in urls in the page
- hash it into IP_hash_bucket_number using consistent hashing and send the retrieval request to the mapped server

The elected server sends heartbeat messages to keep track of active servers. If timeout occurs, the elected server starts 1st phase.

If each server does not receive a heartbeat message within timeout, it voluntarily starts 1st phase.

Any client can query for a url to the elected server and the elected server hashes the url and send redirect message to the client and client query the url again to the redirected server.

- anonymous November 05, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More