Amazon Interview Question
Software Engineer / DevelopersThe web is like a graph for a domain.
Webcrawler is like traversing that graph.
Now, usually when we traverse the graph, we mark a node visited (or create a hashset and maintain the set of visited nodes ).
In this case we can maintain a hashmap with url -> count and increment the count when its encountered. So when the traverse finishes we will have a map with the count.
At the end, just iterate through all the keys and find the url with max value.
Need moderation..
I understand that careercup is an open forum where ppl can come and post their interview question... but >50% lack the full information... I would like moderator to add another filter...
which says reviewed/approved.
a set of moderator reviews the questions and decides if it contains full information or not...
and other people when they filter for company and topic ...they can have another filter ....approved or all....
this will save hell of time that goes in trying to guess /trying to understand questions that are posted without full information...without example....
please save careercup from becoming trash....it's such a wonderful site... let it remain wonderful...save careercup please
//I have modified it to find the most linked filename in a given directory, since the original question can just parse href"" I am assuming all words are filenames
- Stephen November 04, 2008//Included
#include <string>
#include <iostream>
#include <fstream>
#include <map>
using namespace std;
//Prototypes
void calculate_linkage(string start_file);
//Variables
//To keep track of number of times a given filename has been seen
//NOTE only will open filename when filename is not in the map
map<string, int> filemap;
int main(int argc, char** argv) {
string most_linked;
string start_file(argv[1]);
int num_linked = 0;
calculate_linkage(start_file); //Calculate the linking
//Iterate over the map
for(map<string, int>::const_iterator it = filemap.begin(); it != filemap.end(); ++it)
{
if (it->second > num_linked) {
most_linked = it->first;
num_linked = it->second;
}
}
cout << most_linked << ' ' << num_linked;
return 0;
}
void calculate_linkage(string start_file) {
string s;
map<string, int>::iterator it;
//Open file and check that it worked
ifstream fin(start_file.c_str());
if (!fin.is_open())
return;
//Read word by word from file (filename by filename)
while (fin >> s) {
//Lookup filename
it = filemap.find(s);
//If the filename is not in the map
//Add the name into the map and that we have found it 1 time
//And open the file for counting again
if (it == filemap.end()) {
filemap.insert(pair<string, int>(s, 1));
calculate_linkage(s);
}
//Otherwise increment the count of the map
else {
(*it).second++;
}
}
fin.close();
}