Microsoft Interview Question
Software Engineer / DevelopersTeam: Bing
Country: United States
Interview Type: In-Person
Take a look at topcoder tutorial on Rabin-Karp string matching, for now.
I will explain what I mean in the next few hours ;-)
First we will calculate hash value of the pattern and keep it in a variable. Let L be the length of the pattern we are trying to find in the stream. First we will get the first L characters of the stream and store it in a linked list. Then we will take the hash value of those characters and compare it to the hash value of the pattern to find if we have a match. After that we will erase the head of the linked list and update the hash value. Then we will add the next character of the stream to the end of the list and update the hash value. Thereafter, we will compare those two hash values. We will continue this process until we reach the end of the stream.
By triple hashing I mean that we will devise three different hash functions. At any point, if all three hash values are equal to those of the pattern, we can somehow be sure that we have a match.
Topcoder tutorial explains much more and much better than my explanation. You can find how the hash values are calculated in detail. Every other thing that I said is explained thoroughly in the article.
Unfortunately, the site doesn't allow copying links in the comments, google "Rabin-Karp string matching topcoder tutorial"
The hard part of the question is that, you can't store O(P) chars of the stream.
However, the suggestion was "could have used some better DS.."
I wonder how to use a DS with less than O(P)-space. There's something missing in the requirement, I think.
Ouch! You're right :-( We can do something, though. When we have computed the hash values for the pattern, discard the space used by it and replace it with the list I was talking about, we can even implement the list on the char array previously used by P. I do not really get it, even the KMP automaton would need O(P) space. How can we not use the space used previously by P?
pattern = "aba"
matches = [];
while(1) {
var c = getStream();
var i = matches.length;
while (i--) {
var rx = new RegEx("^" + matches[i] + c);
if (pattern.match(rx)) {
matches[i] += c;
if (pattern === matches[i]) {
console.log("Match Found");
matches.splice(i,1);
}
} else {
matches.splice(i,1);
}
}
if (c===pattern[0]) matches.push(c);
}
As I understand the requirement to use "some better DS" implies to achieve O(1) space complexity instead of O(Length(P)) in KMP.
As the problem statement tells nothing about time complexity, the O(n*Length(P)) complexity in worst case should be acceptable. We can use a naive search. It will satisfy the both of the requirements: O(1) space complexity and access to only one char of stream at a time.
The trick here will be to use a prefix of the pattern instead of re-reading the previous characters from the stream, when we know that several first characters of the pattern match a part of the stream. That will help to satisfy the second requirement (access only one char of stream at a time).
This is my solution. (I've not tested it, so it may contains errors):
int Search(SomeStream& stream, const string& pattern) {
int stream_pos = 0;
int matching_chars = 0;
while (!stream.Eof()) {
char ch = stream.ReadNextChar();
if (ch == pattern[matching_chars]) {
if (++matching_chars == pattern.length())
return stream_pos - matching_chars + 1;
} else {
if (matching_chars > 0) {
int prev_matching_chars = matching_chars;
// The trick: use the pattern itself to search for a partial match instead
// of re-reading the previous characters from the stream
int pattern_pos;
for (pattern_pos = 1; pattern_pos <= matching_chars; ++pattern_pos) {
int chars_to_compare = matching_chars - pattern_pos + 1;
// Using substr() here is inefficient. I use it to keep the code
// easier to understand
if (pattern.substr(0, chars_to_compare)
== pattern.substr(pattern_pos, chars_to_compare))
{
matching_chars = chars_to_compare;
break;
}
}
if (pattern_pos > prev_matching_chars)
matching_chars = 0;
}
}
++stream_pos;
}
return -1;
}
What about hashing? I think a triple hash would do the job.
- iroodaz May 02, 2014