Google Interview Question
Software EngineersCountry: United States
Interview Type: In-Person
Assume L contains "abxyz" and "xyw"
Also assume your input is "abxyw".
You can't recognize "xyw" in O(n) with trie.
A more complicated cyclic graph is needed.
To recognize the above with trie need to keep set of active paths.
(Sorry for my English)
@boaznahum: You don't need to do that as "abxyz" and "xyw" both will be stored in try as
|
a x
b y
x w
y
z
So let's take your input "abxyzw". We start the search and we find we have all the element till 'x'. Moment you get 'x' you have two options:
1. Check if the current element is equal to the head of trie and If it is equal to head of trie then call recursive search.
2. Continue till the end of current word. In this case for your given input it will return false but for the recursive search we started in point 1, it will return true.
So it is basically your search function that you need to write properly. However you had found a test case which is extremely crucial to implement the search function.
Code based on above idea.
#define SIZE 26
struct tri{
int complete;
struct tri *child[SIZE];
};
void insert(char *c, struct tri **t)
{
struct tri *current = *t;
while(*c != '\0')
{
int i;
int letter = *c - 'a';
if(current->child[letter] == NULL) {
current->child[letter] = malloc(sizeof(*current));
memset(current->child[letter], 0, sizeof(struct tri));
}
current = current->child[letter];
c++;
}
current->complete = 1;
}
struct tri *t;
int flag = 0;
int found(char *c, struct tri *tt)
{
struct tri *current = tt;
if (current == NULL)
return 0;
while(*c != '\0')
{
int i;
int letter = *c - 'a';
/* if this is the first char then recurse from begining*/
if (t->child[letter] != NULL)
flag = found(c+1, t->child[letter]);
if (flag == 1)
return 1;
if(!flag && current->child[letter] == NULL) {
return 0;
}
current = current->child[letter];
c++;
}
return current->complete;
}
int main()
{
int i;
t = malloc(sizeof(*t));
t->complete = 0;
memset(t, 0, sizeof(struct tri));
insert("weathez", &t);
insert("eather", &t);
insert("weather", &t);
(1 ==found("weather", t))?printf("found\n"):printf("not found\n");
return 0;
}
What if the input is this stream: "xyzdisagreementabc", and the dictionary has the words "agreement", "disagreement", "agree" and "disagree"
The above suggestions won't work here because when you reach 'd', you will start comparing it with your trie and returns the words "disagreement" and "disagree", not knowing that *within* this search, we also need to detect that the 'a' should start the search for "agree" and "agreement"
So my proposal would be to start a new search at every character, and also update every previous search at every character. But this means updating all your previous searches everytime you come across a new character. This will detect all words, but it won't be linear because the number of concurrent searches is now another variable.
So here is my solution.
1. Create a Trie for the list of Strings given.
2. Store the maximum depth of the Trie or in other words, the length of the longest string in a variable called MaxLen.
3. When ever we receive a new character, we search it and the combination of all its previous MaxLen-1 characters. Because we know that any pattern more than MaxLen characters will not be found in the Trie anyway.
4. Every time we find a word, we print the index of the pattern match, however we do not stop the search. The only criteria to stop the search would be when the length of stream under consideration would reach MaxLen value. Then we will remove the first character from the StringBuilder and add the new character at the end and search for all the combinations for that StringBuilder. Which would be MaxLen combinations.
So for every new character you are doing MaxLen search. Assuming there are N characters in the stream, the complexity of this solution is O(MaxLen*n).
This is not linear time. I don't believe there could be any linear time solution possible for this. Since we at least need to match every single character of the stream with all the characters of the words in the list for the worst case. Which would look like this:
List : {a, aa, aaa, aaaa, aaaaa, aaaaaa};
Stream: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
I will post the code soon.
We can actually construct finite automata in memory for each word in the list. If the list total words are n then precomputing complexity is n * max_length_of_word*26(assuming only lower case letters). Now current states for each and every word can be maintained in an array
when a new letter comes update all the states depending on the structure of the automata
if any state reaches a final one then call the api at that moment
Keep L in a trie, and iterate searches in a parallel way, as such:
1. For each search position in our search list:
1.1. If the next character is still in the trie - advance the search position to the corresponding position in the trie. If it hit a complete word - call the external API.
1.2. Otherwise, remove the search from the list.
2. Add a new position to the list. That position is the node following from the root, given the current character.
Well, we can use tries as mentioned above. But I feel that regular expressions can also be used to solve this problem. If the dictionary L is fixed then I believe that this is useful.
Here is my code
import re
def inputs(stream,dicts):
outputs = []
str = ""
for i in dicts:
matchs = re.compile(i)
outputs.extend(re.findall(matchs,stream))
for i in outputs:
str = str+ " "+i
return str
L = ['ok','testing','one','try','trying']
print inputs('oktestingdonehereandtherejustgiveitatryandiamtryingtogetsolution',L)
#Result : ok testing one try try trying
My C# implementation, Uses a Tuple to store the string and testing index, also consider the case that a word can be just a single char
// Test code
char[] stream = { 'a', 'b', 'c', 'o', 'k', 'h', 'd', 'e', 'f', 't', 'r', 'y', 'i', 'n', 'g' };
string[] words = { "ok", "test", "one", "try", "trying", "h"};
var list = new List<string>(words);
FindWords(stream, list);
public void FindWords(char[] stream, List<string> list)
{
// Word to test and index
List<Tuple<string, int>> words = new List<Tuple<string, int>>();
foreach (var current in stream)
{
for (int i=0; i < words.Count; i++)
{
var item = words[i];
var word = item.Item1;
int index = item.Item2 + 1;
if (word[index] == current)
{
if (word.Length == (index + 1))
{
CallAPI(word);
words.RemoveAt(i--);
}
else
{
words[i] = Tuple.Create<string, int>(word, index);
}
}
else
{
words.RemoveAt(i--);
}
}
foreach (var word in list)
if (word[0] == current)
{
if (word.Length == 1)
CallAPI(word);
else
words.Add(Tuple.Create<string, int>(word, 0));
}
}
}
public void CallAPI(string word)
{
Console.WriteLine(word);
}
using System;
using System.Collections.Generic;
delegate void CallAPIH(string w);
class WRec
{
string word;
int c = -1;
public WRec(string w)
{
this.word = w;
c = -1;
}
public string getW()
{
return word;
}
public int check(char ch)
{
c++;
if (word[c] == ch)
if (c == word.Length - 1)
{
c = -1;
return 2;
}
else
return 1;
else
{
c = -1;
return 0;
}
}
}
class Recognizer
{
Dictionary<char, List<string>> ind = new Dictionary<char, List<string>>();
List<WRec> r = new List<WRec>();
CallAPIH f;
public Recognizer(string[] words, CallAPIH f)
{
this.f = f;
foreach (string w in words)
{
if (w.Length == 0)
continue;
List<string> l;
if (!ind.TryGetValue(w[0], out l))
{
l = new List<string>();
ind.Add(w[0], l);
}
l.Add(w);
}
}
public void check(char ch)
{
List<string> l;
if (ind.TryGetValue(ch, out l))
{
foreach (string w in l)
{
r.Add(new WRec(w));
}
}
for (int i = r.Count - 1; i >= 0; i--)
{
switch (r[i].check(ch))
{
case 0:
r.RemoveAt(i);
break;
case 2:
f(r[i].getW());
r.RemoveAt(i);
break;
}
}
}
}
class Program
{
static void Main(string[] args)
{
string[] w = { "ok", "test", "one", "try", "trying" };
string stream = "abcokdeftrying";
Recognizer r = new Recognizer(w, delegate(string s)
{
Console.WriteLine("Detected: " + s);
}
);
foreach (char ch in stream)
{
Console.WriteLine(ch);
r.check(ch);
}
}
}
c++
//iterate over stream
for (auto currentChar : charStream) {
printf("current: %c\n", currentChar);
//always add root to potentially start a new word
possibleWords.push(root);
int size = possibleWords.size();
//iterate over potential words
for (int i = 0; i < size; ++i) {
auto possibleWord = possibleWords.front();
possibleWords.pop();
auto next = possibleWord->getNext(currentChar);
if (next.get() != nullptr) {
//got a letter that can extend current word
if (next->isWord) callApi();
possibleWords.push(next);
}
}
}
1.Maintain a list of all words having same last character(create a list for each alphabet).
2.Construct a string with incoming stream of characters(let its name be superString). So the size of superString increases every time a new character enters the stream.
3.When a new character enters the stream, consider that as a last character of a word and retrieve the list of words(call it as wordList) that has the particular character as the last character.
4. For every word in wordList check whether superString ends with that particular word.
4.1 If the condition is true,Call the API
public static void callApiOnStream(char[] stream,String[] words){
int[] wordCharIndex = new int[words.length];
for(int i=0;i<stream.length;i++){
for(int j=0;j<words.length;j++){
if(words[j].charAt(wordCharIndex[j]) == stream[i]){
wordCharIndex[j]++;
if(wordCharIndex[j] == words[j].length()){
//callApi(words[j]);
System.out.println("Calling api for word: "+words[j]);
wordCharIndex[j] = 0;
}
}else{
wordCharIndex[j] = 0;
}
}
}
}
public static void callApiOnStream(char[] stream,String[] words){
int[] wordCharIndex = new int[words.length];
for(int i=0;i<stream.length;i++){
for(int j=0;j<words.length;j++){
if(words[j].charAt(wordCharIndex[j]) == stream[i]){
wordCharIndex[j]++;
if(wordCharIndex[j] == words[j].length()){
//callApi(words[j]);
System.out.println("Calling api for word: "+words[j]);
wordCharIndex[j] = 0;
}
}else{
wordCharIndex[j] = 0;
}
}
}
}
Without the need of a BST we could solve it with an array keeping the index for each char to be compared.
public static void callApiOnStream(char[] stream,String[] words){
int[] wordCharIndex = new int[words.length];
for(int i=0;i<stream.length;i++){
for(int j=0;j<words.length;j++){
if(words[j].charAt(wordCharIndex[j]) == stream[i]){
wordCharIndex[j]++;
if(wordCharIndex[j] == words[j].length()){
//callApi(words[j]);
System.out.println("Calling api for word: "+words[j]);
wordCharIndex[j] = 0;
}
}else{
wordCharIndex[j] = 0;
}
}
}
}
Sorry I was trying to post without an account and it posted more than once.
How about something like this?
The idea is that with each character we get in we build a character chain and match if that chain exists as a word (or partial word) in our dictionary.
So with each character that goes in, we get all possible words of it, plus valid joined results from previous character chains. If a chain doesn't partially match to a single word anymore, we clean it up.
If I would deploy this kind of code, I would probably rewrite check_dict to create a tree with all words for faster access.
l = ["ok", "test", "one", "try", "trying"]
c = ["a", "b", "c", "o", "k", "d", "e", "f", "t", "r", "y", "i", "n", "g"]
def check_dict(word):
for w in l:
if w.startswith(word):
if word == w:
return 2
return 1
return 0
chains = []
for character in c:
chains.append([])
i = 0
for chain in chains:
if chain is None:
continue
chain.append(character)
if check_dict(''.join(chain)) == 2:
print("Found word: %s" % ''.join(chain))
elif check_dict(''.join(chain)) == 1:
# partial match
pass
else:
# Resolve unneeded chain
chains[i] = None
i += 1
The idea is to use Trie structure to build up the dictionary. Feed the live stream into the dictionary (assuming a new character is coming at a time). Go down deep into the tree (Trie data structure) one more level given the newly coming character. Keep a list of valid tree paths and go through the list when a new character is coming.
- Generate signal to call the API, if a new word is found with the new character.
Keep this tree path because following character(s) can form new words.
- Keep this tree path in the list, if does not find a new word but the tree path is valid.
Because it is one of candidates to form a new word with the following stream.
- Remove this tree path from the list, if the tree path is invalid.
- Search the new character in the root of dictionary
- Call the API and add the tree path into the list, if this single character is a word.
- Add the tree path into the list, if the tree path is valid.
TEST
void TestLiveStream()
{
const std::vector<std::string> words =
{"ok", "test", "one", "try", "trying"};
LiveStreamWordsDetector lsd(words);
const std::string input("abcokdeftrying");
std::vector<std::string> output;
lsd.AppendLiveStream('a', output);
assert(output.empty());
lsd.AppendLiveStream('b', output);
assert(output.empty());
lsd.AppendLiveStream('c', output);
assert(output.empty());
lsd.AppendLiveStream('o', output);
assert(output.empty());
lsd.AppendLiveStream('k', output);
assert(output.size() == 1);
assert(output[0] == "ok");
lsd.AppendLiveStream('d', output);
assert(output.empty());
lsd.AppendLiveStream('e', output);
assert(output.empty());
lsd.AppendLiveStream('f', output);
assert(output.empty());
lsd.AppendLiveStream('t', output);
assert(output.empty());
lsd.AppendLiveStream('r', output);
assert(output.empty());
lsd.AppendLiveStream('y', output);
assert(output.size() == 1);
assert(output[0] == "try");
lsd.AppendLiveStream('i', output);
assert(output.empty());
lsd.AppendLiveStream('n', output);
assert(output.empty());
lsd.AppendLiveStream('g', output);
assert(output.size() == 1);
assert(output[0] == "trying");
}
Create trie out of the list and navigate the nodes as you read from stream in linear time.
- ashish.kaila November 05, 2015