Intuit Interview Question
Software Engineer / DevelopersRead file. For each word in file call addPosition():
Map<String, List<Integer>> positionsMap = ...
void addWord(String word, int fileOffset){
List<Integer> positions = positionsMap.get(word);
if( positions == null ){
positions = new ArrayList<Integer>();
}
positions.add( Integer.valueOf(fileOffset) );
}
This could be done much more quickly by using multithreading.
Say a book has n pages - divide it in 10 parts of size n/10 pages (say) and run threads on each part.
As you scan each page, divide the page in 5 sections and start threads on each section for searching that word...if any of these threads finds the word, record the page number and kill all threads in pool
Assume input comes like 1 file for each page containing words.
Mapper code:
Each Mapper reads one file. Use a separator lets say ($)
output: word, fileName$1
Reducer receives the input as:
word, list of values
values contain fileName$1
for example:
file1$1, file1$1, file2$1, file3$1..
Now we can just process this list and make a structure Map<String, Map<String, Integer>>
which we can write to text file in a format:
word -> file -> count
example:
abc -> file1 -> 2
abc -> file2 -> 1
abc -> file3 -> 1 ...
this is a classic mapreduce problem. But on a single machine, this can be solved using hashmap. For each word encountered, maintain a HashMap<Word, Set<PageNumber>>
- Anonymous August 02, 2012if we have already encountered the word, add the page number to the value set. Else add the word and page number as a new entry.