Flipkart Interview Question
SDE-2sCountry: United States
a. In a loop, iterate over substrings of string1. each substring starts from index 0 and of length(str2.length, str2.length+ 1... str1.length)
b. For each such substring find out the part which contains string 2 as its subsequence
c. If length of that window is smaller than one we have already, store it later comparison
d. output smallest such window.
Complexity: O(n^2)
[Looks like this approach can be optimized for shorted substring search]
Below is the code in C#:
MinimalLengthSubseqToContainStr("abcbcbbbeeeedfaaccmmopopopn", "bedm");
private static void MinimalLengthSubseqToContainStr(string str1, string str2)
{
int minWindowLen = Int32.MaxValue;
string result = string.Empty;
for (int i = str1.Length -1; i >= str2.Length - 1; i--)
{
string minWindow = FindSecondStringInFirst(str1.Substring(0, i +1), str2);
if (!string.IsNullOrEmpty(minWindow))
{
if (minWindow.Length < minWindowLen)
{
minWindowLen = minWindow.Length;
result = minWindow;
}
}
}
Console.WriteLine(result);
}
private static string FindSecondStringInFirst(string str1, string str2)
{
int count = str2.Length - 1;
int index = str1.Length - 1;
int end = -1;
int start = -1;
while (count >= 0)
{
if (index < 0)
{
break;
}
if (str1[index] == str2[count])
{
if(count == str2.Length - 1)
end = index;
if (count == 0)
{
start = index;
break;
}
count--;
}
index--;
}
if (end > start)
return str1.Substring(start, end - start + 1);
else
return string.Empty;
}
@manu.machilles:
You can optimize your solution as explained below:
"abcbcbbbeeeedfaaccmmopopopn", "bedm"
1. The first call to FindSecondStringInFirst(string str1, string str2) with "abcbcbbbeeeedfaaccmmopopopn", and "bedm". And you get minWindow as beeeedfaaccmm. Now instead of making second call to FindSecondStringInFirst with "abcbcbbbeeeedfaaccmmopopop", and "bedm" , call should be with "abcbcbbbeeeedfaaccm", and "bedm".
2. Once you have the second call done with "abcbcbbbeeeedfaaccm", and "bedm", you get minWindow as beeeedfaaccm. Now the third call should be with "abcbcbbbeeeedfaacc", and "bedm" and since minWindow will be null this time, avoid further calls to FindSecondStringInFirst method. This will optimize your solution in a big way.
I hope I am clear in conveying my idea. Please let me know if you see any problem with the proposed modification to your solution.
I ended up using Knuth-robin-pratt. I imported the book Clarissa, or, the History of a Young Lady by Samuel Richardson. My first approach was to find all the index of chars in which an occurrence of the word happened. The I remembered you want the first smallest substring in which it occurs.
it is a bit complex but I really do hope it helps! oh and complexity is O(n+m) where n is the size of the word and m is the size of the document.
//
// main.m
// Knuth-robin-pratt
//
// Created by Developer on 25/11/13.
// Copyright (c) 2013 Yet2. All rights reserved.
//
#import <Foundation/Foundation.h>
NSMutableArray * getOverlapArray(NSString *word)
{
NSMutableArray *overlap;
char c;
unsigned long m = [word length];
int v;
overlap = [[NSMutableArray alloc] initWithCapacity:m];
overlap[0] = [NSNumber numberWithInteger:-1];
for(unsigned int i = 0; i< m-1; i++)
{
c = [word characterAtIndex:i+1];
v = [overlap[i] intValue];
if([word characterAtIndex:v+1] != c && v !=-1){
v = [overlap[v] intValue];
}
if([word characterAtIndex:v+1] == c){
overlap[i+1] = @(v+1);
} else
{
overlap[i+1] = @-1;
}
}
return overlap;
}
void knuthRobinPratt(NSString * word, NSString *document)
{
unsigned long k = 0; //last position to start again
unsigned long m; //number of chars in document
unsigned long j = 0; //position in document
unsigned long n; //number of chars in word
unsigned long i = 0; //Position in word
unsigned long numberOfRepetitions = 0; //number of times word is repeated
NSMutableArray *overlap;
overlap = getOverlapArray(word);
m = [document length];
n = [word length];
while (m-k >= n)
{
while (i < n && [document characterAtIndex:j] == [word characterAtIndex:i])
{
i++;
j++;
}
if(i>=n)
{
NSLog(@" the first range in which the word occurs is[%ld %ld]",k, k+n);
break;
}
if(i == 0) i = 1;
if([overlap[i-1] intValue] > 0){
k = j-[overlap[i-1] intValue];
} else {
if(k==j) j++;
k=j;
}
if(i>0) i = [overlap[i-1] intValue]+1;
}
// NSLog(@"%@ is said :%ld",word,numberOfRepetitions);
}
int main(int argc, const char * argv[])
{
@autoreleasepool {
// insert code here...
NSError *error;
NSString *filePath = @"/Users/developer/Desktop/clarissa.txt";
NSString *document = [[NSString alloc] initWithContentsOfFile:filePath encoding:NSASCIIStringEncoding error:&error];
if(error)
{
NSLog(@"%@",error);
} else {
knuthRobinPratt(@"is",document);
}
}
return 0;
}
I guess this Question can be re-stated as...
Find the smallest possible substring(M) of String(S), such that LCS(M,P)==LCS(S,P) or LCS(M,P)==LEN(P)
It more or like a LCS problem, so it can be solved in O(n*m)+O(k) runtime.
Proof:
LEN(S) = n;
LEN(P) = m;
Lets say such substring exists like, M = S.substring(i,j); j >= i+m & 0 <= i < j <n
and assume that Letters in P have their corresponding replica character positioned as below, in M.
P[0] == M[p0]
P[1] == M[p1]
P[2] == M[p2]
P[3] == M[p3]
....
....
P[m-1] == M[pm-1]
Which means, p0<p1<p2.....<pm-1;
and p0=i & pm-1=j, Since M is substring of S, from i to j.
So all these positions are in increasing Order.
Algo:
1. Create 'm' no.of lists. (list[0],list[1],list[2]....,list[m-1]) // m = length of pattern
2. FOR i = 0 to n-1 // n = length of input string
DO
FOR j=0 to m-1
if(pattern[j]==string[i]) add 'i' to list[j];
END FOR
END DO
END FOR
3. Process m Lists starting at their head nodes as below
list[0].head = p0;
list[1].head = p1;
list[2].head = p2;
list[3].head = p3;
....
list[m-1].head = pm-1;
until they satisfy the property p0<p1<p2.....<pm-1;
and update the min_len = min(pm-1-p0, min_len) or Save the positions p0 and pm-1
4. Redo Step-3 until the lists are exhusted.
#include<stdio.h>
#include<stdlib.h>
void findWindow(char *str1,char *str2,int window,int *min)
{
if(*str2 == '\0')
{
if(*min > window)
*min = window;
return;
}
if(*str1 == '\0')
return;
if(*str1 == *str2)
{
findWindow(str1+1,str2+1,window+1,min);
}
if(window == 0)
findWindow(str1+1,str2,window,min);
else
findWindow(str1+1,str2,window+1,min);
}
int main()
{
char *str1="11000001001001000100";
char *str2="10100";
int window=1000;
findWindow(str1,str2,0,&window);
printf("Least window size is %d ",window);
return 0;
}
Smith-Waterman algorithm (a modified longest common sub-sequence problem) can be used to find the best possible match.
We also need back pointers so that start/end position of the matched sub-sequence can be recovered. It is possible that strPat does not a full sub-sequence with strDNA.
- lasthope November 25, 2013