IBM Interview Question
Software Engineer / DevelopersHere is an O(n^2) solution.
Assume the dictionary of words is provided as a hashtable, which has an O(1) lookup. Then, cost to find all possible valid words in the given string? O(n^2). This is important, because the worst case of my algorithm is to visit all possible valid words once.
function getValidWords(i) returns [list, boolean]
if we have reached end of the string
return [empty_list, true]
let js = start from index i, find an array of j's
such that the span [i, j] is a valid word.
for each j in js
let [lista, boola] = getValidWords(j)
if boola is true then
let listb = span[i, j] ++ lista
return [listb, true]
else
return [empty_list, false]
end for
end function
<pre lang="java" line="1" title="CodeMonkey949" class="run-this">import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
public class DP4 {
public static void main(String args[]) throws FileNotFoundException {
ArrayList<String> aDictionary = new ArrayList<String>();
Scanner sc = new Scanner(new File("E:\\Notes\\twl06.txt"));
while (sc.hasNext()) {
aDictionary.add(sc.next());
}
String sentence = "$itwasthebestoftimes";
correctSentence(sentence, aDictionary);
}
private static void correctSentence(String sentence,
ArrayList<String> aDictionary) {
// TODO Auto-generated method stub
int n = sentence.length();
boolean[] words = new boolean[n];
words[0] = true;
int[] path = new int[n];
for (int i = 1; i < n; i++) {
for (int k = 0; k < i; k++) {
if (words[k]
&& aDictionary.contains(sentence
.substring(k + 1, i + 1))) {
path[i]=k;
words[i] = true;
}
}
}
printSentence(sentence,path,n-1);
}
private static void printSentence(String sentence,int[] path, int n) {
// TODO Auto-generated method stub
if(n==0)return;
printSentence(sentence, path,path[n]);
System.out.print(" "+sentence.substring(path[n]+1,n+1));
}
}
</pre><pre title="CodeMonkey949" input="yes">
</pre>
Recurrence
T[i] = T[k] & isValidWord(k+1,i); where k=0 to i;
T[0] = True;
T is boolean vector.
Where T[i] means sentence can be form with i char of given string.
isValidWord(k+1,i) means word formed with char start from position k+1 to i both inclusive present in dictionary or not. If yes then true else false.
your code fails for the dictionary:
mam
am
ample
pleased
ease
lease
leased
and sentence as:mampleased
output given by your code is: am pleased
which is wrong and rather it should have been:mam pleased.
mayank's code can be modified at the cost of space complexity.
the way in dat code we are looking for all the substrings present in dictionary ,we must save the start and end index(in a 2-D array or arraylist<T>) of all the found words in a dictionary as we process the string.later we can print them with invarient that start index of next words must be one more than end index of previous word. how about this idea?? the time compexity still remains O(n2).
explanation on chirangit's idea of trie:
Preprocessing: insert all entries from the dictionary into a trie.
Algo: 1) Take characters one by one from the beginning of the given text and look up the sequence in the trie, As soon as you hit a leaf or a character that cannot be appended to the current sequence, back track and take the latest valid word from the trie.
- Note that here we take the longest possible word. e.g "hereby" would be parsed as one word.
2) Start a new search from the last offset a valid word was detected.
3) If no valid sequence can be found at some point, back-track to the last time we had the option to choose a shorter word, and make a different choice.
Memorization: In order to avoid re-calculating the same sub-problem during back-tracks, the offset where the longest word was not the good choice needs to be memorized so that any time we hit that offset again as the beginning of a word, we know this is not going to end up in a successful split.
Runtime: O(n^2) since in a straight forward situation, we consume the given text once, but in worst case, we need to back track "n" times and re-process from the beginning.
Hmm, I think the way the question is worded there's more clarification needed.
Ex:
dictionary: hell, low, world
string: helloworld
Is this string "composed of valid words"? I would say yes, even though there's overlap and we cannot separate them with spaces. The recursion would be:
IsValidString(i) = IsValidWord(0,i) OR IsValidString(j) AND IsValidWord(k,i)
where j < i and k <= j+1
- try August 24, 2010