DNA sequence(a string) is giv

Flipkart Interview Question for SDE-2s

0

of 0 votes

8
Answers
DNA sequence(a string) is given (let say strDNA) and another string to search for(let say strPat). You have to find the minimum length window in strDNA where strPat is subsequence(Write code).
- Rahul Sharma November 23, 2013 in United States | Report Duplicate | Flag | PURGE
Flipkart SDE-2 Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

Smith-Waterman algorithm (a modified longest common sub-sequence problem) can be used to find the best possible match.

dp[i][0] = dp [0][j] = 0;
dp[i][j] = min( dp[i-1][j-1] + ( S1[i] != S2[j] ), dp[i-1][j] + 1, dp[i][j-1]+1);

We also need back pointers so that start/end position of the matched sub-sequence can be recovered. It is possible that strPat does not a full sub-sequence with strDNA.

- lasthope November 25, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

a. In a loop, iterate over substrings of string1. each substring starts from index 0 and of length(str2.length, str2.length+ 1... str1.length)
b. For each such substring find out the part which contains string 2 as its subsequence
c. If length of that window is smaller than one we have already, store it later comparison
d. output smallest such window.

Complexity: O(n^2)

[Looks like this approach can be optimized for shorted substring search]

Below is the code in C#:

MinimalLengthSubseqToContainStr("abcbcbbbeeeedfaaccmmopopopn", "bedm");
 

        private static void MinimalLengthSubseqToContainStr(string str1, string str2)
        {
            int minWindowLen = Int32.MaxValue;
            string result = string.Empty;
            for (int i = str1.Length -1; i >= str2.Length - 1; i--)
            {
                string minWindow = FindSecondStringInFirst(str1.Substring(0, i +1), str2);
                if (!string.IsNullOrEmpty(minWindow))
                {
                    if (minWindow.Length < minWindowLen)
                    {
                        minWindowLen = minWindow.Length;
                        result = minWindow;
                    }
                }
               
            }

             Console.WriteLine(result);
        }

        private static string FindSecondStringInFirst(string str1, string str2)
        {
            int count = str2.Length - 1;
            int index = str1.Length - 1;
            int end = -1;
            int start = -1;
            while (count >= 0)
            {
                if (index < 0)
                {
                    break;
                }
                if (str1[index] == str2[count])
                {
                    if(count == str2.Length - 1)
                        end = index;
                    if (count == 0)
                    {
                        start = index;
                        break;
                    }
                    count--;
                }
                index--;
            }
            if (end > start)
                return str1.Substring(start, end - start + 1);
            else
                return string.Empty;
        }

- manu.machilles November 25, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

@manu.machilles:
You can optimize your solution as explained below:
"abcbcbbbeeeedfaaccmmopopopn", "bedm"

1. The first call to FindSecondStringInFirst(string str1, string str2) with "abcbcbbbeeeedfaaccmmopopopn", and "bedm". And you get minWindow as beeeedfaaccmm. Now instead of making second call to FindSecondStringInFirst with "abcbcbbbeeeedfaaccmmopopop", and "bedm" , call should be with "abcbcbbbeeeedfaaccm", and "bedm".
2. Once you have the second call done with "abcbcbbbeeeedfaaccm", and "bedm", you get minWindow as beeeedfaaccm. Now the third call should be with "abcbcbbbeeeedfaacc", and "bedm" and since minWindow will be null this time, avoid further calls to FindSecondStringInFirst method. This will optimize your solution in a big way.

I hope I am clear in conveying my idea. Please let me know if you see any problem with the proposed modification to your solution.

- Rakesh Roy December 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

You could modify the rabin karp algorithm to do that. I will write the code for tonight and post it. :) Have a nice day!

- Ignacio Morales November 25, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I ended up using Knuth-robin-pratt. I imported the book Clarissa, or, the History of a Young Lady by Samuel Richardson. My first approach was to find all the index of chars in which an occurrence of the word happened. The I remembered you want the first smallest substring in which it occurs.

it is a bit complex but I really do hope it helps! oh and complexity is O(n+m) where n is the size of the word and m is the size of the document.

//
//  main.m
//  Knuth-robin-pratt
//
//  Created by Developer on 25/11/13.
//  Copyright (c) 2013 Yet2. All rights reserved.
//

#import <Foundation/Foundation.h>

NSMutableArray * getOverlapArray(NSString *word)
{
    NSMutableArray *overlap;
    char c;
    unsigned long m = [word length];
    int v;
    
    overlap = [[NSMutableArray alloc] initWithCapacity:m];
    overlap[0] = [NSNumber numberWithInteger:-1];
    
    for(unsigned int i = 0; i< m-1; i++)
    {
        c = [word characterAtIndex:i+1];
        v = [overlap[i] intValue];

        if([word characterAtIndex:v+1] != c && v !=-1){
            v = [overlap[v] intValue];
        }
        
        if([word characterAtIndex:v+1] == c){
            overlap[i+1] = @(v+1);
            
        } else
        {
            overlap[i+1] = @-1;
        }
        
    }
    return overlap;
}


void knuthRobinPratt(NSString * word, NSString *document)
{
    unsigned long k = 0; //last position to start again
    unsigned long m; //number of chars in document
    unsigned long j = 0; //position in document
    unsigned long n; //number of chars in word
    unsigned long i = 0; //Position in word
    unsigned long numberOfRepetitions = 0; //number of times word is repeated
    NSMutableArray *overlap;
    
    overlap = getOverlapArray(word);
    
    m = [document length];
    n = [word length];
    
    while (m-k >= n)
    {
        while (i < n && [document characterAtIndex:j] == [word characterAtIndex:i])
        {
            i++;
            j++;
        }
        
        if(i>=n)
        {
            NSLog(@" the first range in which the word occurs is[%ld %ld]",k, k+n);
            break;
        }
        if(i == 0) i = 1;
        if([overlap[i-1] intValue] > 0){
            k = j-[overlap[i-1] intValue];
        } else {
            if(k==j) j++;
            k=j;
        }
        if(i>0) i = [overlap[i-1] intValue]+1;
    }
    
//    NSLog(@"%@ is said :%ld",word,numberOfRepetitions);

}
int main(int argc, const char * argv[])
{

    @autoreleasepool {
        
        // insert code here...
        NSError *error;
        NSString *filePath = @"/Users/developer/Desktop/clarissa.txt";
        NSString *document = [[NSString alloc] initWithContentsOfFile:filePath encoding:NSASCIIStringEncoding error:&error];
        
        if(error)
        {
            NSLog(@"%@",error);
        } else {
            knuthRobinPratt(@"is",document);

        }
        
        
    }
    return 0;
}

- Ignacio Morales November 25, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I guess this Question can be re-stated as...
Find the smallest possible substring(M) of String(S), such that LCS(M,P)==LCS(S,P) or LCS(M,P)==LEN(P)

It more or like a LCS problem, so it can be solved in O(n*m)+O(k) runtime.
Proof:
LEN(S) = n;
LEN(P) = m;
Lets say such substring exists like, M = S.substring(i,j); j >= i+m & 0 <= i < j <n

and assume that Letters in P have their corresponding replica character positioned as below, in M.
P[0] == M[p0]
P[1] == M[p1]
P[2] == M[p2]
P[3] == M[p3]
....
....
P[m-1] == M[pm-1]

Which means, p0<p1<p2.....<pm-1;
and p0=i & pm-1=j, Since M is substring of S, from i to j.

So all these positions are in increasing Order.

Algo:
1. Create 'm' no.of lists. (list[0],list[1],list[2]....,list[m-1]) // m = length of pattern
2. FOR i = 0 to n-1 // n = length of input string
DO
FOR j=0 to m-1
if(pattern[j]==string[i]) add 'i' to list[j];
END FOR
END DO
END FOR
3. Process m Lists starting at their head nodes as below
list[0].head = p0;
list[1].head = p1;
list[2].head = p2;
list[3].head = p3;
....
list[m-1].head = pm-1;
until they satisfy the property p0<p1<p2.....<pm-1;
and update the min_len = min(pm-1-p0, min_len) or Save the positions p0 and pm-1
4. Redo Step-3 until the lists are exhusted.

- Dude January 07, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

#include<stdio.h>
#include<stdlib.h>

void findWindow(char *str1,char *str2,int window,int *min)
{
if(*str2 == '\0')
{
if(*min > window)
*min = window;
return;
}

if(*str1 == '\0')
return;

if(*str1 == *str2)
{
findWindow(str1+1,str2+1,window+1,min);

}
if(window == 0)
findWindow(str1+1,str2,window,min);
else
findWindow(str1+1,str2,window+1,min);
}
int main()
{
char *str1="11000001001001000100";
char *str2="10100";
int window=1000;
findWindow(str1,str2,0,&window);
printf("Least window size is %d ",window);
return 0;
}

CareerCup

Flipkart Interview Question for SDE-2s

Books

Videos

Resume Review

Mock Interviews