Find compound words input->

Interview Question

0

of 0 votes

8
Answers
Find compound words

input-> String (example "he has to go in a wheelchair")
output -> wheelchair = wheel chair

assume that you have a big dictionary to look up the words.

Please help with this question asked in interview?. I want to also understand why a particular data structure was chosen and what is the time complexity.
- Algorithmy November 07, 2014 in United States | Report Duplicate | Flag | PURGE

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

1.Store the dictionary words in a Trie T.

2.Take input string S and split it with " " to get individual words w in the input string.

3.For each input word w, check if w is present in the Trie T of step 1. If yes, move to the next word. Do this for next word till it is found in the T else go to step 4.

4. Let the word w be of length N then do the following:

boolean isCompundWord(String w){
	int N = w.length();
	for(int i=1;i<N-1;i++){
		String x = w.subString(0, i);
		String y = w.subString(i+1, N-1);
		if(T.contains(x) && T.contains(y)){
			return true;
		}
	}
	return false;
}

5.Print all words w which return from above method.

- Ashutosh November 07, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Thanks for replying. Any particular reason to use Trie?

- Algorithmy November 07, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Thank you, this is perfect solution. Please note that my dictionary is limited here. I used PatriciaTrie in Apache commons collections and here is working code -

import java.util.StringTokenizer;

import org.apache.commons.collections4.Trie;
import org.apache.commons.collections4.trie.PatriciaTrie;

/**
 * find compound words
 *
 */
public class CompoundWords {
	
	/**
	 * compound words
	 * @param sentence
	 */
	private void isCompundWord(String sentence) {
		// dictionary
		Trie<String, String> trie = new PatriciaTrie<String>();
		trie.put("a", "a");
		trie.put("awhile", "awhile");
		trie.put("while", "while");
		trie.put("wheel", "wheel");
		trie.put("wheelchair", "wheelchair");
		trie.put("chair", "chair");

		//get the tokens in the sentence
		StringTokenizer st = new StringTokenizer(sentence);
		while (st.hasMoreTokens()) {
			String word = st.nextToken();

			int N = word.length();
			for (int i = 1; i < N; i++) {
				String x = word.substring(0, i);
				String y = word.substring(i, N);
				if (trie.containsValue(x) && trie.containsValue(y)) {
					//match
					System.out.println(word + ":" + x + " " + y);
				}
			}
		}
	}
	
	/**
	 * test program
	 * @param args
	 */
	public static void main(String[] args){
		CompoundWords cw = new CompoundWords();
		cw.isCompundWord("Is he on a wheelchair");
		
		cw = new CompoundWords();
		cw.isCompundWord("It took me awhile to adjust to it, too.");
	}
}

- Algorithmy November 08, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

This approach won't work in case the word is a compound of 3 or more dictionary words.

- Amr Gamal November 09, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

First, Im assuming theres a dictionary data structure that implemented the following necesary methods.

class WordDictionary{
	bool Create(string fileName);//Loads all the words from a file in the disk into the HashTable
	bool isWord(string w); returns true if word in the dictionary, ideally in constant time, probably implemented with a hash table
}

The main idea behind the algorithm is to process every word in the input sentence, where words are separated by an space or ASCII character 32.

For each word... traverse every character in the word apeending it to a temp string variable, and ask the dictionary if the word exists... if it does, add it to your result, and start from the next available character. if it does not, move onto the next character and repeat the process.

The final result will be stored in map, where the key is the composite word (wheelchair), and the values are the simple words (wheel,chair)

void printCompoundWords(const string &sentence,Dictionary &d){
 map<string,vector<string>> result;//dictionary with key-map asociation  
 int wordCount = getWordCount(sentence);//returns the amount of whitespaces +1, O(n)
  for(int i =0; i<wordCount; i++){ O(m) where m is the number of words in this sentence
     string current = getWordAt(sentence, i);
     int len = current.length();
     string partialWord;
     for (int j = 0; j< len; j++){
         partialWord.append (current[j] );//add next char
        if ( dic.isWord( partialWord ) {
		result[currentWord].push_back(partialWord);
                partialWord = "";
	}//for word len
     }//for word count
  } 
printResult(result);//iterates through the std map printing the values in the vectors

}

The runtime is O(n), in fact its O(n) + O(m) + O(k) where k<= 16, cause in english I dont now of any word thats greater than 16 characters in lenght, therefore the O(k) loop (the ones that goes character by character) could be perceived as constant time.

There are a few optimizations that could be done to the code, for instance, instead of the method, getWordCount() the algorithm could assemble the words "on the fly" stoping when it find a space or the end of string.

- RED November 07, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Using Aho-Corasick trie would be the best approach. Build trie using all directionally. Input each words and return matched. For example, input wheelchair will return wheel, chair, wheelchair, heel, hair, eel, air, he etc. You know what to do next.

- d41515 November 07, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I am not worried run time. My only concern is the large memory for required for converting
a LARGE dictionary into trie data structure.

- Anonymous November 08, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

O(n*m*k) where k is the size of the dictionary, m is the maximum length of a word in the dictionary, and n is the length of the input String.

import java.util.HashSet;
import java.util.Set;

public class Main{
	
	
	public static void main(String[] args){
		Set<String> testPhrases = new HashSet<String>();
		testPhrases.add("I am using a hashset.");
		testPhrases.add("My bank account is overdrawn.");
		testPhrases.add("I do not use any compound words.");
		testPhrases.add("I have a doubleset of doublewords.");
		
		Set<String> dictionary = new HashSet<String>();
		dictionary.add("hash");
		dictionary.add("set");
		dictionary.add("over");
		dictionary.add("drawn");
		dictionary.add("double");
		dictionary.add("set");
		dictionary.add("word");
		dictionary.add("words");
		
		for(String phrase : testPhrases){
			findCompounds(phrase, dictionary);
		}
	}
	
	private static void findCompounds(String phrase, Set<String> dictionary){
		for(String subphrase : phrase.split("\\s+")){
			findCompound(subphrase.replace(".", "").toLowerCase(), dictionary);
		}
	}
	
	private static void findCompound(String token, Set<String> dictionary){
		Solution[] isFinishableFromHere = new Solution[token.length()];
		for(int i = token.length() - 1; i >= 0; i--){
			for(String word : dictionary){
				if(token.length() - i < word.length()){
					continue;
				}
				boolean match = true;
				for(int j = 0; j < word.length(); j++){
					if(!word.substring(j, j + 1).equals(token.substring(i + j, i + j + 1))){
						match = false;
						break;
					}
				}
				if(match){
					boolean solved = i + word.length() >= token.length() ? true : isFinishableFromHere[i + word.length()] != null;
					if(solved){
						isFinishableFromHere[i] = new Solution(word);
					}
					
					if(isFinishableFromHere[i] != null){
						break;
					}
				}
			}
		}
		if(isFinishableFromHere[0] != null && isFinishableFromHere[0].getPhrase().length() < token.length()){
			System.out.print(token + " = ");
			printSolution(isFinishableFromHere);
			System.out.println();
		}
	}

	private static void printSolution(Solution[] solns){
		int index = 0;
		Solution cur = solns[0];
		while(cur != null){
			System.out.print(cur.getPhrase() + " ");
			if(index + cur.getPhrase().length() >= solns.length){
				cur = null;
			}
			else{
				cur = solns[index + cur.getPhrase().length()];
			}
		}
	}
	
	private static class Solution{
		private String phrase;
		
		public Solution(String phrase){
			this.phrase = phrase;
		}
		
		public String getPhrase(){
			return phrase;
		}
	}
}

- cubed November 09, 2014 | Flag Reply

CareerCup

Interview Question

Books

Videos

Resume Review

Mock Interviews