Microsoft Interview Question






Comment hidden because of low score. Click to expand.
2
of 2 vote

A simple web crawler is pretty easy to implement. In Java, I know that there are a few libraries that would help you parse HTML pages.
Given an URL, get all the the URLs that are in this page. Then it becomes a Breadth First Search or Depth First Search traversals. Whatever you choose. DFS might consume too much memory in this case.

- vodangkhoa February 27, 2007 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

I think intention of this question is to think about the complex cases. How will you handle infinite loop, how will you handle url shortener etc.

- sam January 10, 2016 | Flag
Comment hidden because of low score. Click to expand.
1
of 1 vote

There are atleast 3 components that are required.
1. HTTP Request/Getting page.
2. HTML Parser
3. URL Tracker

THe first component is to request a given URL and either download it to the machine or just keep it in memory. (Downloading will need design to store the web page for easy retreival)

HTML Parser - Removes the html tags and retains text of interest (I needed only part of the page based on some pattern) and URL s in the current page. A more generic webcrawler will have to save different components like image/sound etc

URL Tracker - URL tracker makes sure that no URL is visited twice within a set time frame( A simple mechanism is a hash table with a user-defined comparator function, some urls may still point to the exact same page eg www.abc.com and www.abc.com/index.htm)

The web crawler should start with a set of URLs.

- mp March 07, 2007 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Think about maintaining a data structure for mainting links (fan-in and fan-out) counts. One might also be interested in displaying a graph(forest) of node(URL) and edges(fan-in and fan-out).

- peace November 04, 2009 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

use lucene and much and get done with it

- vipul March 03, 2007 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Hey dude . . That info was useful

- Sriram October 18, 2009 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More