Yahoo Interview Question for Software Engineer / Developers


Country: United States




Comment hidden because of low score. Click to expand.
4
of 4 vote

You will need a Hashtable and a Min-heap for this.
1) Begin with populating the Hashtable with the String (<key>) and their frequency (<value>)
2) Once you completed 1000 strings, iterate through the Hashtable and maintain a min-heap of the size K .
3) Iterate till the end and only add the Keys from the Hashtable that have a value greater than the top of the heap.

- wolfengineer February 20, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

let us say the file has 2 billion strings and there are 1 billion distinct strings then the hashtable must contain 1 billion strings? if so, it requires lots of memory? Am i missing something?

- snigda May 26, 2014 | Flag
Comment hidden because of low score. Click to expand.
1
of 1 vote

I guess any algorithm that stores the strings is going to face space issues. One way to store strings efficiently is a trie. At each leaf node we can store the number of times it appears. Then you can simply use the top K min/max elements in the array algorithm.
Time complexity O(N)
Space complexity not sure. It will be great if anybody can shed more light on it.

- kr.neerav February 20, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

I thought of the same,

Build 26 trie's one of each letter store in hashmap
keep a count of each word in the leaf node.

sounds good for now,

how will we find top 1000?

- byteattack June 01, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Can you explain the need to build 26 tries, i didn't get it.
To find top 1000 store the values of each leaf node in an array and then find top 1000.

- kr.neerav June 16, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Reading the other solutions it looks like using max heap after building tree with be more space efficient.

- kr.neerav June 16, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

can we solve it like this:
1) maintain a hashmap for (string, (repeats, pointer to min-heap node)), and a min-heap for the top k strings
2) for a new string, we first increments its "repeats" in hashmap, if the resulting "repeats" is greater than the root value, we then need to remove the root node(if the numbers nodes in heap is equal to K). And, we need to check if the new node already exists in the heap,

- samuel February 20, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote
{{{ //find top 1000 strings that appear frequently from billions //find file //input file // File.iostream.input String searchvar "lamp"; While (inputfile.in) { For I = 0 I <= searchvar; i++) Return searchvar.count(n); Output number of occurences } - rizla February 20, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

a form of min heap of max elements would help :)

- Anonymous February 20, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think we need calculate a hash value for each string without compare the whole string.
The following are simple.
Maintain a HashMap of <hash value and repeat times>.
Maintain a HashMap of the top 1000 strings <hash value and string>

- huichen229 February 20, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Use a trie along with the min heap.
Trie would take much lesser space as compared to a hash table.

- maxomax March 11, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

This seems like a mapReduce problem. i.e. Calculate the count of each word and then sort them in the end.
Basically have the Map output Words and Count in 100,000 Strings (We have 10,000 Nodes) that would each Output a Word and a Count.
We will then have a Reduce for each word (All rows with the same Word are sent to Reduce that aggregates it).
We will then again sort the list by order, and pick the top 1000 (Not sure if Quicksort is possible in the context of MapReduce/Hive).

- Subbu September 14, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

There are many approach some of them are like using data partition or mapreduce.
I have written code to handle small chunk of data to find top 5 most occurred words, this can be modified accordingly.

and

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

/**
* @author
*
*/
public class FrequentlyOccurredString {

static List<Entry<String, Integer>> listOfWords = new ArrayList<Entry<String, Integer>>();

public static void sortMap(HashMap<String, Integer> inputMap) {

Set<Entry<String, Integer>> set = inputMap.entrySet();
listOfWords = new ArrayList<Entry<String, Integer>>(set);
Collections.sort(listOfWords, new Comparator<Map.Entry<String, Integer>>() {
public int compare(Map.Entry<String, Integer> obj1,
Map.Entry<String, Integer> obj2) {
int value = (obj2.getValue()).compareTo(obj1.getValue());
return value;
}
});

}

public static void printMap(List<Entry<String, Integer>> list) {
int count = 0;
for (Map.Entry<String, Integer> wordOccurrence : list) {
if (count <= 4) {
System.out.println("'" + wordOccurrence.getKey() + "' - " + ""
+ wordOccurrence.getValue());
}
count++;
}
}

public static void findFrequentlyOccurredString(String inpString, String dictionaryString) {

String tempString = "";
int DEFAULT_VALUE = 1;

String[] storeStringToken = inpString.replace(" ", "").trim().split(" ");
HashMap<String, Integer> countStringOccurrence = new HashMap<String, Integer>();

for (int i = 0; i < storeStringToken.length; i++) {
// insert all new words to hashmap
if (!countStringOccurrence.containsKey(storeStringToken[i])) {

if (!dictionaryString.toLowerCase().contains(
storeStringToken[i].toLowerCase())) {

tempString = storeStringToken[i];
countStringOccurrence.put(tempString.trim(), DEFAULT_VALUE);

}
}

// else process existing words from hashmap
else {

tempString = storeStringToken[i];
for (Map.Entry<String, Integer> mapEntry : countStringOccurrence
.entrySet()) {
// check if the key already there
if (mapEntry.getKey().equals(tempString)) {
int keyOccurrence = mapEntry.getValue();
keyOccurrence++;
countStringOccurrence.put(tempString, keyOccurrence);
}
}

}

}

// sort hashmap in decending order
sortMap(countStringOccurrence);

// print hashMap
printMap(listOfWords);

}

public static void main(String[] args) {
String inp = "It is globally known for its Web portal, search engine Yahoo! Search, and related services, including Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Finance, Yahoo! Groups, Yahoo! Answers, advertising, online mapping, video sharing, fantasy sports, and its social media website.";
String dictionary = "andtheinorbetweenisaMrtheretheirthemthosetheybyonbutduringafterbeforeassuchreturnsfollowsfromtoaboveofoffwantsbecauseitsknown";
inp = inp.replaceAll("[^a-zA-Z\\s]", "");
findFrequentlyOccurredString(inp, dictionary);

}

}

and

- Sandy0109 September 01, 2016 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More