You are given a string 's' and

Google Interview Question

0

of 0 votes

39
Answers
You are given a string 's' and you are given a dictionary of english words. You goal is to write an algorithm that returns all words from the dictionary the can be formed by characters from that string 's'.

Example:
s = "ogeg"
following words can be formed from 's': go egg ego . . .

Further it is given that string 's' always consists of four lower case characters. Lets say that the dictionary is contained in a file and can be fitted in the memory. It is up to you which data structure you use to represent the dictionary.

How would you design an efficient algorithm? Follow up: What if the dictionary file can not be fitted in the memory?
- autoboli January 07, 2015 in United States | Report Duplicate | Flag | PURGE
Google

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 5 vote

What we can do is we can create permutation of all the strings which can be formed using the input string 'S':
Let's say the string 's' is "ab"
Permutations:
a, b, ab, ba

Now just search each permutation in the dictionary in log(n) time(binary search as it's a dictionary and sorted) this takes (no.of permutations)*log(n). if we say that the length of s is fixed to 4, then there can only be constant number of permutations and hence this will run in log(n) time.

- deepanshu January 08, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I agree with this one. Actually, there are several ways to solve this problem:
1. Count each char in "S", and scan the dictionary, for each word, compare the count.
However, when the size of the dictionary is big, there will be lots of repeated count. Eg. In the dict., we have "cat", "cats"
2. It is good to build a trie, but not easy to solve in the interview.
3. The problem statement clearly states "S" only has 4 char, which means the permutation is not expensive at all. After that, all you need to do is just to build a hashmap for all permutation, and scan the dict.

- nickyang07 January 17, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The searching won't be in log(n) time but log(n)*length of the string searching for, as we then need to compare the string character by character. In the worst case it becomes n^2*log(n)

- kartikvermun January 18, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 3 vote

You can use a trie to represent the dictionary, then simply traverse the trie using the 4 characters?

I'm not entirely sure, but this seems like a valid approach.

- Skor January 07, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

This is not too hard (I think). First, we need to define the data structure of the dictionary in the memory. We build a hash table of the dictionary where the key is sorted letters and the value is a list of all words that can be created by those letter

i.e. 'dgo' is the key for for the list of words of 'dog' and 'god', 'aet' is the key for list of words of 'eat', 'ate' and 'tea'.

Then for the input string, we compute all the permutation of letters and for each permutation we form them into different set.

i.e. abcd is one of the possible permutation, we create all possible set such as {'a', 'bcd'}, {'a', 'b', 'cd'}, {'a', 'b', 'c', 'd'}, {'abc', 'd'}, {'a', 'bc', 'd'}. For each set, we use each element in the set as a key to search into our hash table dictionary.

- hiuhchan January 08, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Nice idea as well. The only thing I don't like is to create every permutation of the string as mentioned above. So far both the trie with sorted key and the hash table with sorted key seems like valid solution to this problem.

I still have an impression that there must be something better.

- autoboli January 08, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

hi autoboli:

Thanks for your comments. In fact, I thought about that and I actually do not need to create permutation of the string. All I need is for each input string, I create all the possible set, including the empty set. but of course the meaning of "Set" here will be character 'a' in the the string is different from character of 'a'. By that I mean if I have a string 'aaab', first 'a' is different from second 'a' and I cannot reduce it into set {'a', 'b'}, I will still need to treat it in term of {'a', 'a', 'a', 'b'}

- hiuhchan January 08, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

If the dictionary cannot fit in memory:

Insert all the sorted subsets of the input string into a hash set:
a, b, c, d, ab, ac, ad, bc, bd, cd, abc, abd, acd, bcd, abcd

Then as you read a word from dictionary, sort the letters and lookup in the hash set.

- Anonymous February 08, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Well, it is more complicated I think. How about this:
OFFLINE:
1) Take only <=4 char words from dictionary
2) Sort each word, keep unsorted word as well
3) Put it to the trie

If a char node in the trie is the final char it will contain a reference to a list of unsorted dictionary words e.g.
a-b-c --> {abc, cab, bac}
a-a-f-l --> {alfa fala}

ONLINE:
1) Sort 's'
2) Search in trie and return corresponding list of words.

It should be O(1) time. Well.... no it is not gonna work :( The problem is that you should search for all sorted substrings of 's' Hence, it does not sound like an elegant idea......

- autoboli January 07, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

If a word contains 2 a, it can be potentially matched by 2 a, 3 a, 4 a, but not less than 2.
If a word contains 0 b, it can be potentially matched by 0b, 1b and etc (Interesting observation - it cannot be matched by 4b because it leaves no more letters for a word. If it contains to letters, it cannot be matched by 3b)

Just thinking about cross-sectioning sets with given properties to get very small set to work with.

- CT January 07, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I think we may need trie data structure to save the sorted string and then traverse the trie.

- Himanshu Jain January 08, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Seems pretty easy, just define an array to hold the counts of each letter in the string s (remember we can index into an array by converting ascii char to its int representation). Doing so requires a linear scan of the array.

Now for each word in the dictionary do the same thing (you can even reuse the array as you go through words in the dictionary), and compare the counts of each character in the word to the counts of each character in the string s. Specifically, it requires that the count of each letter in the array holding the counts of s is greater than that of the array holding the count of a given word. If we define n as the total sum of the lengths of all the strings, the algorithm runs in O(n) time and O(1) space.

For the follow up, simply write the results to a file once hitting the memory limit and free the corresponding memory, rinse and repeat.

- Anonymous January 08, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Interesting approach but too expensive. Plus, it is not O(1) space. For dictionary of size N you have to keep N histograms hence it is O(N) space and O(N) complexity. If you switch to unicode you will be screwed.

- autoboli January 08, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

1. first Filter all words that can't be created using chars of string S, this any word that doesn't contain only chars from S, iterate over every word from dictionary and check if it contains only chars from S, filter out those who don't have.

2. From the result of 1 create a Trie from all remaining words (the ones you can form from S)

3. Traverse the Trie printing all leafs(resulting words)

- guilhebl January 08, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

For english you have 26 files, each is responsible for particular character. If word contains this character then it is presented there. "good" is presented in file g,o and d. Each record also contains a field how much particular character have occured in this word.
For good g - 1, o - 2, d - 1. Each file contains word in sorted order. So when you receive and request you find intersection of c1 and c2 in O(n), and so on. Also you check quantity of particular character in the words.

- glebstepanov1992 January 08, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I'll use Dictionary in a Hash table format where Keys are in alphabetically sorted order , and values are all possible dictionary strings that can be made from key:

example :
[e] => {}
[g] => {}
[eg] => {egg , /*strings containing only e and g alphabets in dictionary order*/ }
[0] => {}
[go] => {go}
[ego] => {ego}

1. now , sort the string s /* = "ogeg */ and remove duplicates , hence s' = ego , and character counts as g = 2, e = 1, o = 1
2. traverse over all permutations of strings

string permutations[] ;

for( i = 1 ; i < 2^(length(s')) ; i++ ){
	string temp = "";
	j = i;
	index = 0;
	while( j != 0){
		if( j >> 1 ){
			temp = s'[index] + "" + temp;
		}
		index++;
	}
	permutations[i] = temp;
}

3. for all permutations find all possible strings from dictionary , taking all permutations as dictionary key

we have variables : g = 2, e = 1, o = 1

- for all the permutations get hashtable[key] values
- loop through values , if isvalid() then print it.

isvalid(){
/* if character count in value = character count in permutation string */
}

- Source January 08, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I like this approach, but I have a huge improvement to your map. If the map used the counts of letters in 's' and mapped to a collection of valid words produced by those keys, the number of possible entries in the map would be significantly reduced and computational complexity of searching the map would be reduced from O(n!) to O(n^2) where n is the length of 's'. An off-the-cuff example of what I mean is something like

'e1g2' -> { 'egg', 'geg', 'gge' } //if all those strings are valid.
'e1g1o1' -> { 'ego', 'eog', 'geo', 'goe', 'oeg', 'oge' }
etc

- zortlord January 09, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@zortlord, should e2g2 also map to egg?

- CT January 09, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@ct

I don't think so. I mean, you COULD do that, but it would then make it harder to generate the map because you'd have to relate all the collections that could be subsets.

- zortlord January 09, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@ct - i agree with zortlord!

I do have 1 more idea that will improve the efficiency of algo

instead of saving related dictionary words in an array of strings against a hash key, we can form trie!!

hash key will be trie root and related dictonary strings as trie child nodes.

hence if we have 10 hash keys in dictionary , then for every key we'll form a trie structure, therefore 10 tries.

- Anonymous January 10, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Anonymous

I'm not sure that a trie would be of benefit. Finding all the applicable words would then take extra work ( O(n!) Where n is the number of characters for the entry) because you'd have to traverse the entire stricture to get out the words that to return them.

- zortlord January 10, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Assumptions:
1. This function is only going to be called once. If this were to be called multiple times, it should use preprocessing like building a trie
2. valid characters are ASCII (8 bit representation)

This function will operate in O(n) complexity and O(n) memory (only O(n) to store the results before returning them) where n is the number of words:

public static ArrayList<String> getValidWords(String str, String[] dict){
	if(str == null || dict == null){
		throw new NullPointerException():
	}
	int[] strSig = getSig(str);
	ArrayList<String> results = new ArrayList<String>();
	for(String word : dict){
		if(validSig(strSig, word){
			results.add(word);
		}
	}
	return results;
}

private static int[] getSig(String str){
	int[] sig = new int[256];
	for(char c : str.toCharArray()){
		sig[cToI(c)]++;
	}
	return sig;
}

private int cToI(char c){
	return (int)c;
}

private static boolean validSig(int[] sig, String word){
	if(word.length() > 4){
		return false;
	}
	int[] usableSig = new int[256];
	for(char c : word.toCharArray()){
		int i = cToI(c);
		if(usableSig[i] == sig[i]){
			return false;
		}
		usableSig[i]++;
	}
	return true;
}

If the dictionary file cannot fit in memory, then I would stream the file and pull each individual word out for comparison. This change would only affect the first method and only really around the while loop.

- zortlord January 08, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Why use a seoarate cToI function, it seems unnecessary and somewhat confusing

- Anonymous January 08, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I like to do that just in case I decide to support limited alphabets (ie: only 26 letters vs the entire 256 set).

- zortlord January 09, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I am a bit confused by the if condition

if(usableSig[i] == sig[i]){
    return false;
}

doesn't this mean, if for words "ant" and "apple", you return false right away for 1st character and not check of rest ?

and also when you return true, you don't check anything, just coming out of for loop says the word can be formed with given characters ?

please explain, I might be overlooking something here.

- tazo January 12, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

#include<stdio.h>
#include<string.h>



main()
{
        char s[10];
        int i = 0;
        int j,k;
        char dictionary[10][20];
        printf("create a dictionary\n");
        while(scanf("%s",dictionary[i]) != EOF)
        {
                i++;
        }

        printf("enter the string to be searched\n");
        scanf("%s",s);

        i=0;
        while(dictionary[i][20])
        {
        j=0;
        k=0;
        while(s[j++] == dictionary[i][k++])
        {
                if(s[j] == '\0')
                {
                        printf("%s\n",dictionary[i]);
                }
        }
        i++;
        }

}
~

- kshitiz gupta January 09, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Assumption: the dictionary is a text file containing all words separated by spaces.
Complexity:
Can be loaded in memory : O(1)
Cannot be loaded in memory: O(n)

So, we'd just sort up the input word and make up sorted strings of all possible lengths i.e. from 2 to 4 eg: ogeg = oegg = oe,og,gg,oeg,ogg,egg. This can be done in constant time.
While making the dictionary out of the text file, we can take each word, sort it and add it to a HashMap<String,ArrayList<String>> that could store all the words with a given set of letters. Since this is pre-processing, it can also be achieved in constant time. All you gotta do now homie is check those words created in the hashtable. So overall constant.

If we cannot load the dictionary in the memory, the problem clearly becomes a string matching problem, which can be solved in O(n) using Z-Algorithm. Just makeup all the possible permutations (4! + 4C3*3!+ 4C2*2!) and search'em up!

- renegade403 January 09, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Create sorted key to exact match map for keys length 1, 2, 3, 4.

For instance, aab={baa,aba} but not ba

Create a multi-root tree where 4 letter keys are roots, and up to 4 children are generated by removing one of the letters. (could be less because in aabc remove a once) - similarly for other children until have one letter grandchildren leafs.

Note, that this is not really a tree but a graph because abc is a child of both, aabc and abcz, so the amount of nodes will not exceed one million.

Walk it from 4 letter key till one letter and collect set of words.

Why is it better than just computing all of the words for each of the 4 letter keys, including subsets. The thing is, 3, 2, 1 letter keys are shared across path, so less repetition. And they can be loaded/unloaded on demand, per memory requirement (for instance, un-load lest frequently walked node). Note that the set of words for exact 4 letter match will be small, it is subset that will generate a lot, and they are shared.

- CT January 09, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Maybe this solution is too simple, but from the original text, I assume that:

- The alphabet can only have 26 characters (english lowercase)
- The dictionary is not sorted (so we will need at least O(n) to know it's content).
- There is no relation between each dictionary word and the input sequence.

#include <list>
#include <string>
#include <iostream>
#include <fstream>
#include <stdint.h>

class WordPattern
{
	public:
		// here "useCountLimit" indicates that a dictionary word can not have more than the characters in the input
		WordPattern (const std::string &s, bool useCountLimit) : _useCountLimit(useCountLimit)
		{
			for (int i(0); i < _alphabetSize; ++ i)
				_charsCount[i] = 0;
			
			int len = s.size();
			for (int i(0); i < len; ++ i)
				_charsCount[(s[i]-'a')] ++;
		}
	
		inline bool canFormWord (const std::string &s) const
		{
			uint8_t charsCount [_alphabetSize];
			for (int i(0); i < _alphabetSize; ++ i)
				charsCount[i] = _charsCount[i];
			
			int len = s.size();
			for (int i(0); i < len; ++ i)
			{
				uint8_t charVal (s[i]-'a');
				if (_useCountLimit)
				{
					if (charsCount[charVal] > 0)
						charsCount[charVal]--;
					else
						return false;
				}
				else
				if (charsCount[charVal] == 0)
					return false;
			}
			return true;
		}
	
	private:
		static const uint8_t _alphabetSize = 26;
		uint8_t _charsCount [_alphabetSize];
		bool _useCountLimit;
};

void checkFromFile (const std::string &fileName, const std::string &input, bool useCountLimit = false)
{
	WordPattern pattern (input, useCountLimit);
	std::ifstream file (fileName.c_str(), std::ifstream::in);
	for (std::string line; std::getline(file, line); )
	{
		if (pattern.canFormWord(line))
			std::cout << line << std::endl;
	}
	file.close();
}

void checkFromList (const std::list<std::string> &words, const std::string &input, bool useCountLimit = false)
{
	WordPattern pattern (input, useCountLimit);
	std::list<std::string>::const_iterator it;
	for (it = words.begin(); it != words.end(); it ++)
	{
		if (pattern.canFormWord(*it))
			std::cout << (*it) << std::endl;
	}
}

int main()
{
	std::list <std::string> words;
	words.push_back ("go");
	words.push_back ("egg");
	words.push_back ("ego");
	words.push_back ("the");
	words.push_back ("god");
	words.push_back ("movie");
	words.push_back ("google");
	
	checkFromList (words, "ogel");
	//checkFromFile ("dict.txt", "ogel");
	return 0;
}

- Anonymous January 09, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

We can create a hash map of letters and their counts in the original word and then for each word in dictionary check if it consists only letters from this map only in fewer amount. This will take constant time for each letter

#include <unordered_map>
#include <vector>
#include <iostream>
using namespace std;

class StringPermutation{
private:
	unordered_map<char, int> origLetters;
	StringPermutation(){};
public:
	StringPermutation(string orig){
		for (auto c: orig){
			if (origLetters.find(c) != origLetters.end())
				origLetters[c]++;
			else
				origLetters.insert(make_pair(c, 1));
		}
	}
	bool check(string s)
	{
		unordered_map<char, int> lettersHere(origLetters);
		for (auto c: s)
		{
			if (lettersHere.find(c) != lettersHere.end())
			{
				lettersHere[c]--;
				if (lettersHere[c] == 0)
					lettersHere.erase(c);
			}else
				return false;
		}
		return true;
	}
};

int main()
{
	vector<string> dict{"gg", "ego", "ed", "egg", "hhg"};
	StringPermutation sp("ogeg");
	for (auto s: dict)
	{
		if (sp.check(s))
			cout << s << endl;
	}
}

- Ekaterina January 12, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

It can be an interesting to note that there are only 14950 versions of sorted s. 26!/(4!22!). Knowing Google: no matter what your solution is, I am certain that Google would want to hear this before you even attempt to discuss efficiency of your algorithm. Also, knowing them, they gave the 4 letters in the condition for reason.

The only reason I am saying this - review and practice combinatorics in preparation to the interview. I think they want you to see it approximating on the whiteboard : 26!/(4!22!) = 26*25*23, roughly 30*25*20 which is easy to calculate in the head.

- CT January 12, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

You have the number of possible words wrong, because letters can be repeated. The formula for this is (n + k - 1)!/((k - 1)!n!). In this case n = 4 and k = 26, so there are 29!/(25!4!) = 29*28*27*26/(4*3*2) = 23,751 words. Agree with everything else you said though!

- npc January 13, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Oops, the formula I gave above assumes the words have length of exactly 4. It didn't account for words of length 3, 2 or 1. If you do account for those there are 27,404 possibilities.

(First rule of correcting someone on the Internet: double check your correction!)

- npc January 13, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

O, thanks, I always keep confusing pascal triangle row and row in DP counting matrix. This is part of my practice to eventually stop confusing them.

BTW, the problem did say s is exactly 4 characters, no less.

- CT January 13, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

s is exactly 4 characters, but you could have valid words with less than 4 characters, examples stated are go egg ego . . .
So, the dictionary needs to be checked at each level of building permutations.

- Dan January 23, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

public static void printDicWords(String input,ArrayList<String> dic){
if(input == null || dic == null) return;
getCombination(dic, new StringBuffer(), "ogeg", 0);
}

public static void getCombination(ArrayList<String> dic, StringBuffer sb, String input, int start){
int max = input.length() -1;
for(int i=0; i<input.length(); i++){
int index = (i+start)%max;
StringBuffer cur = sb.append(input.charAt(index));

if(dic.contains(cur.toString())){
System.out.println("dic from : " + cur.toString());
}

if(sb.length() < input.length()){
getCombination(dic, cur, input, index+1);
}
sb.deleteCharAt( sb.length() - 1 );
}
}

- Anonymous January 13, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Some of the other answers here have mentioned building a trie. I think I'd build 2 of them: one out of all possible permutations of the input letters 4, the other out of all dictionary words up to length 4. Assuming a 26 letter alphabet, there are up to 64 possible input words of length 1, 2, 3 or 4 (not all of them distinct if there are repeated letters; and up to 27,404 possible dictionary words of length 1, 2, 3 or 4. Both tries are small enough that they're guaranteed to fit in memory on any realistic device, regardless of whether the full dictionary is too large.

To get the matching words you do a preorder traversal of both tries simultaneously, only moving to child c if its present in both and yielding every word you reach.

Building the dictionary trie is O(D) where D is the number of words in the dictionary. We have to visit each word in the dictionary once to build the trie and, because we're only interested in words below a fixed length, insertion into the trie is O(1) for each word.

Building the input trie is O(1) - but with a large constant factor. The number of insert operations we perform is proportional to the number of permutations of the input letters, which is a constant; and each insert operation is O(1) too.

Likewise, matching the input trie against the dictionary trie is O(1) because the size of each trie has a constant upper bound and the time taken is proportional to the smaller of the two tries.

I've written some very elegant code demonstrating this solution to the problem, but unfortunately this margin is too small to contain it. :-)

- npc January 13, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Solution according to @deepanshu's suggested answer. Generates all permutations and the does binary search over the chunks of dictionary (default chunks 50 lines). Dictionary is assumed to contain only lower-letter-latin-character words, sorted and without repetitions.

This solution is also convenient for large dictionary (when dictionary can't fit in the memory) because of it's buffering nature.

#include <fstream>
#include <algorithm>
using namespace std;

const int buffer_size = 50;

vector<string> chck(const vector<string> &v, const vector<string> &n, int w){
  int vsz = int(v.size());
  int nsz = int(n.size());
  vector<string> res;

  if (vsz > 0) {
    // as long as current needle is potentially in this segment, try to find it
    while (w < nsz && n[w] <= v[vsz-1]) {
      // do binary seach over partial dictionary:
      // 1) since needle is lexicographically before last word, lower bound exists
      // 2) assume no duplicate words in the dictionary
      auto lb = lower_bound(v.begin(), v.end(), n[w]);
      if (*lb == n[w]) {
        res.push_back(n[w]);
      }

      // go to the next needle (skip repeated)
      do {
        ++w;
      } while (w < nsz && n[w-1] == n[w]);
    }
  }
  return res;
}

int main() {
  ifstream fin("dict");
  ofstream fout("words.txt");

  // sort letters
  string letters = "ogeg";
  int lsz = int(letters.size());
  sort(letters.begin(), letters.end());

  // generate words to be searched, (1) - init
  vector<string> needles;
  for (int i = 0; i < int(letters.size()); ++i) {
    needles.push_back(string(1, letters[i]));
  }

  // generate words to be searched, (2) - BFS
  // max. 64 words for 4 letters (there are repetitions if letters repeat)
  for (int p = 0; p < int(needles.size()); ++p) {
    string cn = needles[p];
    int cnl = int(cn.size());

    // mark all used letters
    vector<bool> used(lsz, false);
    for (int i = 0; i < cnl; ++i) {
      for (int j = 0; j < lsz; ++j) {
        if (!used[j] && cn[i] == letters[j]) {
          used[j] = true;
          break;
        }
      }
    }

    // create new needles by appending unused letters to the current needle
    for (int j = 0; j < lsz; ++j) {
      if (!used[j]) {
        needles.push_back(cn + letters[j]);
      }
    }
  }

  // sort needles
  sort(needles.begin(), needles.end());

  // next needle to be searched for
  int next_needle = 0;

  while (true) {
    vector<string> pdict;
    string word;
    int cnt = 0;

    // read in maximum 50 lines and do binary search over them
    for ( ; cnt < buffer_size && getline(fin, word); ++cnt) {
      pdict.push_back(word);
    }

    vector<string> picked = chck(pdict, needles, next_needle);
    for (auto it = picked.begin(); it != picked.end(); ++it) {
      fout << *it << endl;
    }

    if (cnt < buffer_size) {
      break;
    }
  }
}

- plesiv March 18, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Here is a question, why do we need to put the whole dictionary into the memory? The goal is to output all words that can be formed from s. From the example, we know that the order doesn't matter. So we only need to construct a hash map and store all four characters of s in it. And then read the dict from the stream and compare it with the characters in the hash map. I used another hash map. Since there are at most four characters for both strings, we don't need much extra space, do we?

However, if the question is, how to store the output words in an efficient data structure, I guess a trie should be a good one.

public static void validWords(String s, String dict) throws FileNotFoundException, IOException{
		BufferedReader br = new BufferedReader(new FileReader(dict));
		String currLine;
		Map<Character, Integer> sMap = new HashMap<Character, Integer>();
		for(int i = 0; i < s.length(); i++){
			char c = s.charAt(i);
			if(!sMap.containsKey(c))
				sMap.put(c, 1);
			else
				sMap.put(c, sMap.get(c) + 1);
		}
		Map<Character, Integer> word = new HashMap<Character, Integer>();
		while((currLine = br.readLine()) != null){
			if(currLine.length() > 4)
				continue;
			word.clear();
			boolean valid = true;
			for(int i = 0; i < currLine.length(); i++){
				char c = currLine.charAt(i);
				if(!sMap.containsKey(c)){
					valid = false;
					break;
				}
				if(!word.containsKey(c))
					word.put(c, 1);
				else
					word.put(c, word.get(c) + 1);
				if(word.get(c) > sMap.get(c)){
					valid = false;
					break;
				}
			}
			if(valid)
				System.out.println(currLine);
		}
		br.close();
	}

- DoraShine March 30, 2015 | Flag Reply

CareerCup

Google Interview Question

Books

Videos

Resume Review

Mock Interviews