Google Interview Question
Software Engineer in TestsI think suffix tree, make a suffix tree of pattern and move the string over it....tricky suffix tree soln, if anybody interested for complete code just let me know..
hii netappreject, can u please provide me with the complete code for the above question using suffix tree with detailed explanation.. my is id piyushkumar081@gmail.com
thanks in advance...
Though I am sure my solution is not elegant it works and it is O(n). The inner for loop in the code is a small one(i think I can further halting points) and AFAIK it does make the algorithm larger than O(n). I got idea for this code from the largest sum problem(-ve numbers included) in programming pearls. Here goes the code, let me know your thoughts.
Code is not being allowed here so I put it on ideone
http: //ideone.com / Z3TDm
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace CareerCupProblems
{
class Program
{
static void Main(string[] args)
{
Program p=new Program();
string chars = "adbb";
string str = "ajfblkdasb";
int[] ar = p.MinimumWindow(chars, str);
Console.WriteLine(ar[0] + " " + ar[1]);
}
//http://www.careercup.com/question?id=3268664
//Finding the minimum window in a string in which we can find all the chars in another string
public int[] MinimumWindow(string chars, string str)
{
Dictionary<char, int> charhash = new Dictionary<char, int>();
foreach (char c in chars)
{
if (charhash.ContainsKey(c))
charhash[c]++;
else
charhash.Add(c, 1);
}
int minSoFar = str.Length;
int minStart = 0, minEnd = 0;
for (int i = 0; i < str.Length; i++)
{
if (charhash.ContainsKey(str[i]))
{
Dictionary<char, int> tempCharHash = new Dictionary<char, int>(charhash);
if (tempCharHash[str[i]] == 1)
tempCharHash.Remove(str[i]);
else
tempCharHash[str[i]]--;
int restCount = chars.Length;
for (int j = i + 1; j < i+1+restCount && j < str.Length; j++)
{
if (tempCharHash.ContainsKey(str[j]))
{
if (tempCharHash[str[j]] == 1)
{
tempCharHash.Remove(str[j]);
if (tempCharHash.Count == 0)
{
if (minSoFar > j - i)
{
minSoFar = j - i;
minStart = i; minEnd = j;
}
break;
}
}
else
tempCharHash[str[j]]--;
}
else restCount++;
}
}
}
return new int[] { minStart, minEnd };
}
}
}
Explaining my algorithm
1. Create a charhash with the characters T with <character,count>
2. Initiliaze minSoFar=str.length and minstart and maxstart
3. traverse the String S i to S.length
3.1 If S[i] in present in hash
3.1.1 Create a new temp hash cloning/copying the charhash
3.1.2 Remove this char(S[i]) from the temp hash if the count is 1 or reduce the count
3.1.3 Initialize a window size to characters T.length.
3.1.4 traverse the string starting from j=i+1 to i+1+windowsize
3.1.4.1 if tempcharhash contains the str[j]
3.1.4.1.1 remove the char from the tempcharhash if count is 1 here also check if the tempcharhash is empty
3.1.4.1.2 if yes update maxsofar,minstart and minend values and break
3.1.4.1.3 reduce the count if tempcharhash[str[j]] count is not 1
3.1.4.2 if the str[j] is not present increase the window size
3.2 Return minstart and minend
@ftfish: your approach is not exactly correct.
"when the left border moves right, the right border also does." it fails in below case:
string: acccabeb
pattern: ab
windows should be [1,6], then [5,6], then [5,8]
smallest is [5,6]. your approach can't find this boundary, as you mentioned to forward right boarder whenever left boarder forwards.
No. formally (and still not completely formal) I said, when left border moves right, the right border will never move left.
In your example:
first [1,6]
The left border moves until index 5 without moving The right border. This will find [5,6].
Then another move of the left border forces the right border to move, to the end. [5,8] will not be found by my algorithm because it is clearly bigger than [5,6]
Now, it seems convincing. But, I think it'd be O(n|T|) time solution, because for window[i,j] it needs to check if it contains all chars of set T atleast once. Please share if you think it can be truly doable in O(n) time, and explain the details to do the above check in O(1) time. Thanks.
maintain a hash table of characters in the window (including the number of appearances) and a global counter indicating how many chars in the set T are in hashtable.
left border i moves right => remove one character s[i] (number of appearances--) from the hashtable. If the count of s[i] becomes 0, decrement the global counter.
right border j moves right => insert one character s[j] into the hashtable. if s[j] is in T and s[j] was not in the hashtable, increment the global counter.
move the left border by one every time, and move the right border right (possibly stays still) until the global counter == |T|
The set T should also be implemented with a hash table so that we can check whether a char is in T or not. if the chars are all in ascii, an array of size 128 will do.
@ftfish,
In the example being discussed, we are saying
string: acccabeb
pattern: ab
But in the question T = Set and so T={'a','b'}
hence if string is string: bacccaeb, your smallest window is [1,2], i.e. occurrance of all elements of T (order does not matter).
Let me know if you aggree and if yes, "Left most border of pattern" needs more thought.
It seems all occurences need to be recorded down and analyzed later.
Think about this case,
S= aaabdacefaecbef
T=a,b,c
possible windows: (1,7), (3,7), (4,7), (7,14)... (11,14),
The min is 4.
The above seems can not be o(n)
The original question missed some condition? Any ideas?
After a night of thinking, it can be resolved in o(n).
• Window: contain all chars in T, can be multiple, if the char repeats
• The min window, size is smallest
combine the above ideas, the key point:
• Use hash table to make it possible to o(n), it will reduce the char matching
• Repeated char can cause multiple windows, must be addressed, every char in T keep all index values the chars in T appears, stored in an array of int vectors, V
• To end the string browsing earlier, we can keep a count in both global and every char in T(init as 0), whenever we find a new char, increase the count to 1, (never change again), and increase the global count. If the global count is size of T, we stop browsing and do processing for final result.
• The final data processing: browse through the array of Vector V,
o Pick one int from each of the vector, that construct the 1st window, keep all the values in the form of tuple(array?) for the min window, keep two values, min and max, and window size = max – min
o Iterate through all values in the array, replace the value in the correlated tuple, recalculate the window size, if the size is smaller than the cur value, replace the tuple and update the values
Code is complicated, will do it when I have time.
this can be done in O(n log n) using binary search
assume L and H are window widths where
-> there are no window of size L conatining all characters
-> there is atleast one window of size H that contains all characters
* now we can check in O(n) whether any window of size k contains all characters.
-> find if there exists a windows that contains all the characters in window of size (L+H)/2
so it becomes a binary search after all :)
i am still thinking if it can be done in O(n)
Here is my solution:
void MinimumWindow(const char * str, const char * set)
{
int hash[256];
int i, set_len;
int winStart = -1, winSize = 0, chCount = 0;
const char *p;
for (i = 0; i < 256; i ++)
{
hash[i] = -2;
}
for (p = set, set_len = 0; *p; p ++, set_len++)
{
hash[(unsigned char)*p] = -1;
}
for (i = 0; str[i]; i ++)
{
int hash_val = hash[str[i]];
if (hash_val == -2)
{
if (winStart != -1 && chCount < set_len)
{
winSize++;
}
continue;
}
if (hash_val == -1)
{
if (winStart == -1)
{
winStart = i;
}
winSize++;
chCount ++;
}
else if (chCount >= set_len)
{
int newStart = i, newSize;
for (p = set; *p; p ++)
{
int ii = hash[(unsigned char)*p];
if (*p != str[i] && newStart > ii)
{
newStart = ii;
}
}
newSize = i + 1 - newStart;
if (newSize < winSize)
{
winStart = newStart;
winSize = newSize;
}
}
else
{
winSize ++;
}
hash[str[i]] = i;
}
printf("start = %d, size = %d\n", winStart, winSize);
}
1. Record the positions for each character, ex,
For "abcbdef", record as
a->0
b->1,3
c->2
d->4
e->5
f->6
2. Get the most tight range for the search characters {'b','e'}, that is to get the tight range of
b->1,3
e->5
Use back-tracing, we can get the final result. While coding, could use some rules for pruning.
The complexity might be O(N) <not sure>.
The problem can be solved in O(n) by using hashtable.
First, counting the frequenties of each chars in the string S, and create a mapping M { char -> int }.
We only consider chars which are contained in the set T.
Second, maintain two pointers, one from the left end, and the other one from the right end.
We move the two points towards center of the string:
1. Before moving the pointer, reduce the pointed char frequency by 1
2. Stop if the pointed char frequency is 0.
public String minWindow(Set<Character> t, String s) {
Map<Character, Integer> map = new HashMap<Character, Integer>();
for (Character c : t) {
map.put(c, 0);
}
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
Integer freq = map.get(ch);
if (freq != null)
map.put(ch, freq + 1);
}
// Corner case, the String s does not satisfy the condition
for (Character c : t) {
if (map.get(c) == 0)
return ""; // return empty window
}
int pLeft = 0, pRight = s.length() - 1;
while (pLeft < pRight) {
char ch = s.charAt(pLeft);
Integer freq = map.get(ch);
if (freq > 1) {
pLeft++;
map.put(ch, freq - 1);
continue;
}
ch = s.charAt(pRight);
freq = map.get(ch);
if (freq > 1) {
pRight--;
map.put(ch, freq - 1);
continue;
}
break;
}
return s.substring(pLeft, pRight + 1);
}
The intuition behind the idea is as follows:
Treat the input string S as a bag of chars. Bag is a data structure that allows duplicates in set, or, in other words, a bag is a mapping from a value to frequency.
We want to find a longest substring L in S, such that L is a bag of chars with each char's frequency greater than zero.
So we start from left end and right end of the string S, reduce the length of the string S as long as the bag property holds.
I think without the intuition the solution was clearer:).
Because you use the order of the characters in the string, thus you do not consider it as a bag of chars.
Actually, the code does not look exactly correct.
char ch = s.charAt(pLeft);
Integer freq = map.get(ch);
if (freq > 1) { // here freq may be == NULL if the first and the last characters in s do not lie in t
then you will return the whole string
Just add if(freq==NULL) { pLeft++; continue; }
and similar for pRight
Here is my solution. Suprising that no one used Regex.
public class MinimumWindowCharMatch {
public static String findMinWindowCharMatch(Set<Character> c, String s) {
String minWindoCharMatch = "";
Set<Character> trackSet = null;
StringBuilder sbr = null;
if (c != null && s != null && s.length() > c.size()) {
trackSet = new HashSet<Character>();
sbr = new StringBuilder();
sbr.append("[");
for (Character character: c) {
sbr.append(character);
}
sbr.append("]");
System.out.println(sbr.toString());
for (int i = 0; i < s.length(); i++) {
Character character = s.charAt(i);
if (character.toString().matches(sbr.toString())) {
if (trackSet.add(character)) {
minWindoCharMatch = minWindoCharMatch + character;
if (trackSet.size() == c.size()) {
break;
}
} else {
trackSet.clear();
minWindoCharMatch="";
}
} else {
trackSet.clear();
minWindoCharMatch="";
}
}
}
return minWindoCharMatch;
}
public static void main(String[] args) {
Set<Character> c = new HashSet<Character>();
c.add('a');
c.add('r');
c.add('c');
c.add('t');
c.add('e');
System.out.println(findMinWindowCharMatch(c, "backtraced"));
}
}
One more solution in java. Realy O(N)
import java.util.*;
public class MinWindow {
public static void main(String[] args) {
MinWindow p = new MinWindow();
String chars = "adbb";
String str = "ajfblkdasb";
int[] ar = p.MinimumWindow2(chars, str);
System.out.println(ar[0] + " " + ar[1]);
}
class ThreadInfo {
Hashtable<Character, Integer> charhash;
int startIndex;
int endIndex = Integer.MAX_VALUE;
ThreadInfo(int startIndex, Hashtable<Character, Integer> charhash) {
this.charhash = new Hashtable<Character, Integer>(charhash);
this.startIndex = startIndex;
}
boolean isFull() {
return charhash.isEmpty();
}
boolean decrement(Character c, int index) {
if (! isFull() && charhash.containsKey(c))
if (charhash.get(c) == 1) {
charhash.remove(c);
if(isFull()) {
endIndex = index;
return true;
}
} else {
charhash.put(c, charhash.get(c) - 1);
}
return false;
}
}
public int[] MinimumWindow2(String chars, String str) {
Hashtable<Character, Integer> charhash = new Hashtable<Character, Integer>();
for (char c : chars.toCharArray()) {
if (charhash.containsKey(c))
charhash.put(c, charhash.get(c)+1);
else
charhash.put(c, 1);
}
List<ThreadInfo> openThreads = new ArrayList<ThreadInfo>();
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
if (!charhash.containsKey(c)) {
continue;
}
//Hashtable<Character, Integer> tempCharHash = new Hashtable<Character, Integer>(charhash);
openThreads.add(new ThreadInfo(i, charhash));
for(ThreadInfo ti : openThreads) {
ti.decrement(c, i);
}
}
int minLen = Integer.MAX_VALUE;
ThreadInfo minTi = null;
for(ThreadInfo ti : openThreads) {
if (ti.isFull()) {
int len = ti.endIndex - ti.startIndex;
if (len < minLen) {
minLen = len;
minTi = ti;
}
}
}
return new int[]{minTi.startIndex, minTi.endIndex};
}
}
public static int globalLowerBound;
public static int globalUpperBound=Int16.MaxValue;
public static List<int> arrA;
public static List<int> arrB;
//Console.WriteLine("----------- FIND SUBSET WINDOW -----------------");
//Program.arrA = new List<int> {1,2,5,1,2,6,5,5};
//Program.arrB = new List<int> { 1, 6, 5, 2 };
//Program.FindSubsetWindow( arrA.Count-1,0);
//Console.WriteLine(Program.globalLowerBound + ", " + globalUpperBound);
public static void FindSubsetWindow(int upperBound, int lowerBound)
{
if (((upperBound - lowerBound) < arrB.Count<int>()-1) || lowerBound <0)
{
return;
}
bool found = false;
foreach (int b in arrB)
{
found = false;
for (int i = lowerBound; i <= upperBound; i++)
{
if (b == arrA[i])
{
found = true;
}
}
if (!found)
{
return;
}
}
if (found)
{
if ((upperBound - lowerBound) < (globalUpperBound - globalLowerBound))
{
globalLowerBound = lowerBound;
globalUpperBound = upperBound;
}
FindSubsetWindow(upperBound - 1, lowerBound);
FindSubsetWindow(upperBound, lowerBound + 1);
}
}
1) Initialize an array 'Count' of size |T| to -1, where Count[i] refers to the last seen index of T[i] within S
2) Scan through S, and update Count array, and from the point when all the characters within T are 'seen' and say we are currently scanning S[i] (this is easy to keep track of, say have a counter X which is incremented whenever a value Count[i] is changed FROM -1)
- Get the lowest value within Count array, and subtract it with current index i.
- Keep track of the smallest difference found so far
This can be achieved in O(nk) where n is the length of string and k is the characters in the set. On O(nk), all possible windows as well as minimum window can be found. Take the solution on the same lines as counting sort.
Additionaly, k can at max be 26 here given that the set is a character set. Does anyone see a problem in this approach
prime substitution + minimum length subarray with product of char substitutions being a multiple of product of char substitution in T.
- randomcoder August 27, 2010