Yahoo Interview Question
Software Engineer / DevelopersCountry: United States
I guess any algorithm that stores the strings is going to face space issues. One way to store strings efficiently is a trie. At each leaf node we can store the number of times it appears. Then you can simply use the top K min/max elements in the array algorithm.
Time complexity O(N)
Space complexity not sure. It will be great if anybody can shed more light on it.
I thought of the same,
Build 26 trie's one of each letter store in hashmap
keep a count of each word in the leaf node.
sounds good for now,
how will we find top 1000?
Can you explain the need to build 26 tries, i didn't get it.
To find top 1000 store the values of each leaf node in an array and then find top 1000.
can we solve it like this:
1) maintain a hashmap for (string, (repeats, pointer to min-heap node)), and a min-heap for the top k strings
2) for a new string, we first increments its "repeats" in hashmap, if the resulting "repeats" is greater than the root value, we then need to remove the root node(if the numbers nodes in heap is equal to K). And, we need to check if the new node already exists in the heap,
This seems like a mapReduce problem. i.e. Calculate the count of each word and then sort them in the end.
Basically have the Map output Words and Count in 100,000 Strings (We have 10,000 Nodes) that would each Output a Word and a Count.
We will then have a Reduce for each word (All rows with the same Word are sent to Reduce that aggregates it).
We will then again sort the list by order, and pick the top 1000 (Not sure if Quicksort is possible in the context of MapReduce/Hive).
There are many approach some of them are like using data partition or mapreduce.
I have written code to handle small chunk of data to find top 5 most occurred words, this can be modified accordingly.
and
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
/**
* @author
*
*/
public class FrequentlyOccurredString {
static List<Entry<String, Integer>> listOfWords = new ArrayList<Entry<String, Integer>>();
public static void sortMap(HashMap<String, Integer> inputMap) {
Set<Entry<String, Integer>> set = inputMap.entrySet();
listOfWords = new ArrayList<Entry<String, Integer>>(set);
Collections.sort(listOfWords, new Comparator<Map.Entry<String, Integer>>() {
public int compare(Map.Entry<String, Integer> obj1,
Map.Entry<String, Integer> obj2) {
int value = (obj2.getValue()).compareTo(obj1.getValue());
return value;
}
});
}
public static void printMap(List<Entry<String, Integer>> list) {
int count = 0;
for (Map.Entry<String, Integer> wordOccurrence : list) {
if (count <= 4) {
System.out.println("'" + wordOccurrence.getKey() + "' - " + ""
+ wordOccurrence.getValue());
}
count++;
}
}
public static void findFrequentlyOccurredString(String inpString, String dictionaryString) {
String tempString = "";
int DEFAULT_VALUE = 1;
String[] storeStringToken = inpString.replace(" ", "").trim().split(" ");
HashMap<String, Integer> countStringOccurrence = new HashMap<String, Integer>();
for (int i = 0; i < storeStringToken.length; i++) {
// insert all new words to hashmap
if (!countStringOccurrence.containsKey(storeStringToken[i])) {
if (!dictionaryString.toLowerCase().contains(
storeStringToken[i].toLowerCase())) {
tempString = storeStringToken[i];
countStringOccurrence.put(tempString.trim(), DEFAULT_VALUE);
}
}
// else process existing words from hashmap
else {
tempString = storeStringToken[i];
for (Map.Entry<String, Integer> mapEntry : countStringOccurrence
.entrySet()) {
// check if the key already there
if (mapEntry.getKey().equals(tempString)) {
int keyOccurrence = mapEntry.getValue();
keyOccurrence++;
countStringOccurrence.put(tempString, keyOccurrence);
}
}
}
}
// sort hashmap in decending order
sortMap(countStringOccurrence);
// print hashMap
printMap(listOfWords);
}
public static void main(String[] args) {
String inp = "It is globally known for its Web portal, search engine Yahoo! Search, and related services, including Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Finance, Yahoo! Groups, Yahoo! Answers, advertising, online mapping, video sharing, fantasy sports, and its social media website.";
String dictionary = "andtheinorbetweenisaMrtheretheirthemthosetheybyonbutduringafterbeforeassuchreturnsfollowsfromtoaboveofoffwantsbecauseitsknown";
inp = inp.replaceAll("[^a-zA-Z\\s]", "");
findFrequentlyOccurredString(inp, dictionary);
}
}
and
You will need a Hashtable and a Min-heap for this.
- wolfengineer February 20, 20141) Begin with populating the Hashtable with the String (<key>) and their frequency (<value>)
2) Once you completed 1000 strings, iterate through the Hashtable and maintain a min-heap of the size K .
3) Iterate till the end and only add the Keys from the Hashtable that have a value greater than the top of the heap.