Morgan Stanley Interview Question
Software Engineer / DevelopersPlease tell, why time will be O(1)? If one applies hash function to each word, it`ll be O(n*1) = O(n), won`t it?
you are ignoring the word "amortized",
amortized O(1) means average running time per operation over a worst-case sequence of operations is constant.
@blueskin.neo,
suppose I want to print both the word and their count. In that case how would u do that. Can you please write the algorithm for that?
hey hamepal, here you go
/* Algorithm to count words in a file. Please dont reply opinions on the code, comments on complexity analysis is appreciated
For each word check if the word exists in the hash, if it doesnt then insert the word with a count 1
if it does then increment the value
For each word in the hashmap, print word and its count
Complexity: Linear insertion and space
NOTE:The code only compiles with newer C++ use the switch -std=c++0x in your Makefile*/
#include <iostream>
#include <vector>
#include <unordered_map>
typedef std::unordered_map<std::string, int> MyHashMap;
void printMyHashMap(MyHashMap::const_iterator itr,MyHashMap::const_iterator enditr){
while(itr!=enditr){
std::cout<<(*itr).first<<" : "<<(*itr).second<<std::endl;
++itr;
}
}
int main(){
MyHashMap wordsMap; //declare unordered map
//vector of strings
std::vector<std::string> strings {"count","words","in","a","file",
"and","print","words","with","their","count",
".","more","words","file"};
int len = strings.size();
std::vector<std::string>::const_iterator wordsItr;
wordsItr = strings.begin();
MyHashMap::iterator mapItr = wordsMap.begin();
for ( wordsItr=strings.begin(); wordsItr!=strings.end(); wordsItr++ ){
mapItr = wordsMap.find(*wordsItr);
if(mapItr==wordsMap.end()){ //if word not found then
wordsMap.insert(MyHashMap::value_type(*wordsItr, 1));
}else{
(*mapItr).second = ++(*mapItr).second; }
}
std::cout<<"words and their count :"<<std::endl;
printMyHashMap(wordsMap.begin(),wordsMap.end());
return (0);
}
#include <iostream>
#include <algorithm>
#include <vector>
#include <fstream>
#include <map>
#include <iterator>
using namespace std;
int main() {
string str;
ifstream in("input.txt");
map<string, int> m;
if (in.is_open()) {
while(in >> str) {
m[str]++;
}
}
for(map<string, int>::iterator it = m.begin(); it != m.end(); it++) {
cout << (*it).first <<"\t" << (*it).second << endl;
}
}
input.txt
abc abc abc abc abc abc abc abcdef abc abcd
abc abc abc abc abc abc abc abcdef abc abcd
abc abc abc abc abc abc abc abcdef abc abcd
abc abc abc abc abc abc abc abcdef abc abcd
count words in a file and print words with their count more words file
output
ajay@ubuntu:~/workspace$ ./a.out
a 1
abc 32
abcd 4
abcdef 4
and 1
count 2
file 2
in 1
more 1
print 1
their 1
with 1
words 3
Internally, the elements in the map are sorted from lower to higher key value following a specific strict weak ordering criterion set on construction. Assuming red-black trees. the complexity would be O(n*logn). try Hashmap, thanks
RB-trees have insertion item/searching item complexity O(lgn), not O(nlogn). Read Cormen & Co.
Insertion of an element is O(logn) --> for n elements O(nlogn)
Now, we want to traverse all elements, retrieval of each element could be O(n) or O(nlogn) depending on the container used.
But for a hashmap, we have functions that give you O(n) insertion for n elements and O(n) for traversal of n elements. which one would you choose?
My bad, a typo: please ignore the above comment.
Insertion of an element is O(logn) --> for n elements O(nlogn)
Now, we want to traverse all elements, retrieval of n elements could be O(n) or O(nlogn) depending on the container used.
But for a hashmap, we have functions that give you O(n) insertion for n elements and O(n) for traversal of n elements. which one would you choose?
How would you find the unique words and their count?
Unique!
So you just use ArrayList to put words as you read a word and remove it if already exists. So the count will always be one (remember unique words). we are not asked to list other words.
import java.util.HashMap;
public class TestTest {
public static void main(String[] args){
String inputString = "abc abc abc abc abc abc abc abcdef abc abcd abc abc abc abc abc abc abc abcdef abc abcd" +
" abc abc abc abc abc abc abc abcdef abc abcd abc abc abc abc abc abc abc abcdef abc abcd" +
" count words in a file and print words with their count more words file";
HashMap<Object, Integer> myMap = new HashMap<Object, Integer>(){
@Override
public Integer put(Object arg0, Integer arg1) {
Integer i = super.get(arg0);
arg1 = i==null?1:i+1;
return super.put(arg0, arg1);
}
};
String[] splits = inputString.split(" ");
for(String s : splits){
myMap.put(s,1);
}
System.out.println(myMap);
}
}
import java.util.*;
public class Freq {
public static void main(String[] args) {
Map<String, Integer> m = new HashMap<String, Integer>();
// Initialize frequency table from command line
for (String a : args) {
Integer freq = m.get(a);
m.put(a, (freq == null) ? 1 : freq + 1);
}
System.out.println(m.size() + " distinct words:");
System.out.println(m);
}
}
String s = "Hi Hello bob Hello Matt Hi jully" ;
Map<String,Integer> numberOfWords = new HashMap<String,Integer>();
String[] allWords = s.split("\\s");
for (String s1 : allWords) {
if (numberOfWords.containsKey(s1)) {
numberOfWords.put(s1, numberOfWords.get(s1) + 1);
} else {
numberOfWords.put(s1, 1);
}
}
Iterator it = numberOfWords.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pairs = (Map.Entry)it.next();
System.out.println(pairs.getKey() + " = " + pairs.getValue());
}
We can create a Trie based structure with one modification.
Every trie node will store the counts of the words that end with it ( the count will be 0 if it is not a leaf ).
Ex. Hi John ! How are you ? Hi Jane, I'm doing good. Hiya everybody, how's the party ?
will have a node structure for H(0) --> i (2) --> y (0) --> a (1)
This will conserve space and worst time complexity will be O(K) where K is the maximum number of alphabets in that structure.
Of course the author meant: find all the unique words and their count (i.e. how many unique words there are).
Use a binary tree or a hash table (i.e. std::map<string, int> or std::unordered_map<string, int>) to insert all the words with their counts. After this traverse this structure and print all words with count = 1 and also count them (i.e. those with count = 1).
It depends on what operations on the data structure you are interested in. Two alternatives: hashmap , trie. If you just want the count then a hash map with value = count
- blueskin.neo November 19, 2010time: amortized O(1)
space: depends on the hash function, best case: O(unique_words)