Interview Question
Country: United States
Thank you, this is perfect solution. Please note that my dictionary is limited here. I used PatriciaTrie in Apache commons collections and here is working code -
import java.util.StringTokenizer;
import org.apache.commons.collections4.Trie;
import org.apache.commons.collections4.trie.PatriciaTrie;
/**
* find compound words
*
*/
public class CompoundWords {
/**
* compound words
* @param sentence
*/
private void isCompundWord(String sentence) {
// dictionary
Trie<String, String> trie = new PatriciaTrie<String>();
trie.put("a", "a");
trie.put("awhile", "awhile");
trie.put("while", "while");
trie.put("wheel", "wheel");
trie.put("wheelchair", "wheelchair");
trie.put("chair", "chair");
//get the tokens in the sentence
StringTokenizer st = new StringTokenizer(sentence);
while (st.hasMoreTokens()) {
String word = st.nextToken();
int N = word.length();
for (int i = 1; i < N; i++) {
String x = word.substring(0, i);
String y = word.substring(i, N);
if (trie.containsValue(x) && trie.containsValue(y)) {
//match
System.out.println(word + ":" + x + " " + y);
}
}
}
}
/**
* test program
* @param args
*/
public static void main(String[] args){
CompoundWords cw = new CompoundWords();
cw.isCompundWord("Is he on a wheelchair");
cw = new CompoundWords();
cw.isCompundWord("It took me awhile to adjust to it, too.");
}
}
First, Im assuming theres a dictionary data structure that implemented the following necesary methods.
class WordDictionary{
bool Create(string fileName);//Loads all the words from a file in the disk into the HashTable
bool isWord(string w); returns true if word in the dictionary, ideally in constant time, probably implemented with a hash table
}
The main idea behind the algorithm is to process every word in the input sentence, where words are separated by an space or ASCII character 32.
For each word... traverse every character in the word apeending it to a temp string variable, and ask the dictionary if the word exists... if it does, add it to your result, and start from the next available character. if it does not, move onto the next character and repeat the process.
The final result will be stored in map, where the key is the composite word (wheelchair), and the values are the simple words (wheel,chair)
void printCompoundWords(const string &sentence,Dictionary &d){
map<string,vector<string>> result;//dictionary with key-map asociation
int wordCount = getWordCount(sentence);//returns the amount of whitespaces +1, O(n)
for(int i =0; i<wordCount; i++){ O(m) where m is the number of words in this sentence
string current = getWordAt(sentence, i);
int len = current.length();
string partialWord;
for (int j = 0; j< len; j++){
partialWord.append (current[j] );//add next char
if ( dic.isWord( partialWord ) {
result[currentWord].push_back(partialWord);
partialWord = "";
}//for word len
}//for word count
}
printResult(result);//iterates through the std map printing the values in the vectors
}
The runtime is O(n), in fact its O(n) + O(m) + O(k) where k<= 16, cause in english I dont now of any word thats greater than 16 characters in lenght, therefore the O(k) loop (the ones that goes character by character) could be perceived as constant time.
There are a few optimizations that could be done to the code, for instance, instead of the method, getWordCount() the algorithm could assemble the words "on the fly" stoping when it find a space or the end of string.
O(n*m*k) where k is the size of the dictionary, m is the maximum length of a word in the dictionary, and n is the length of the input String.
import java.util.HashSet;
import java.util.Set;
public class Main{
public static void main(String[] args){
Set<String> testPhrases = new HashSet<String>();
testPhrases.add("I am using a hashset.");
testPhrases.add("My bank account is overdrawn.");
testPhrases.add("I do not use any compound words.");
testPhrases.add("I have a doubleset of doublewords.");
Set<String> dictionary = new HashSet<String>();
dictionary.add("hash");
dictionary.add("set");
dictionary.add("over");
dictionary.add("drawn");
dictionary.add("double");
dictionary.add("set");
dictionary.add("word");
dictionary.add("words");
for(String phrase : testPhrases){
findCompounds(phrase, dictionary);
}
}
private static void findCompounds(String phrase, Set<String> dictionary){
for(String subphrase : phrase.split("\\s+")){
findCompound(subphrase.replace(".", "").toLowerCase(), dictionary);
}
}
private static void findCompound(String token, Set<String> dictionary){
Solution[] isFinishableFromHere = new Solution[token.length()];
for(int i = token.length() - 1; i >= 0; i--){
for(String word : dictionary){
if(token.length() - i < word.length()){
continue;
}
boolean match = true;
for(int j = 0; j < word.length(); j++){
if(!word.substring(j, j + 1).equals(token.substring(i + j, i + j + 1))){
match = false;
break;
}
}
if(match){
boolean solved = i + word.length() >= token.length() ? true : isFinishableFromHere[i + word.length()] != null;
if(solved){
isFinishableFromHere[i] = new Solution(word);
}
if(isFinishableFromHere[i] != null){
break;
}
}
}
}
if(isFinishableFromHere[0] != null && isFinishableFromHere[0].getPhrase().length() < token.length()){
System.out.print(token + " = ");
printSolution(isFinishableFromHere);
System.out.println();
}
}
private static void printSolution(Solution[] solns){
int index = 0;
Solution cur = solns[0];
while(cur != null){
System.out.print(cur.getPhrase() + " ");
if(index + cur.getPhrase().length() >= solns.length){
cur = null;
}
else{
cur = solns[index + cur.getPhrase().length()];
}
}
}
private static class Solution{
private String phrase;
public Solution(String phrase){
this.phrase = phrase;
}
public String getPhrase(){
return phrase;
}
}
}
1.Store the dictionary words in a Trie T.
2.Take input string S and split it with " " to get individual words w in the input string.
3.For each input word w, check if w is present in the Trie T of step 1. If yes, move to the next word. Do this for next word till it is found in the T else go to step 4.
4. Let the word w be of length N then do the following:
5.Print all words w which return from above method.
- Ashutosh November 07, 2014