Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
Initially, prepare a inverted index, which maps the character to all the words having that character. When a c_set is given to be tested if it covers the w_set perform -
1. Create a bitmap for all words in w_set set to 0 initially.
2. Read the given c_set and find the words associated with that c from the inverted index. Update the value for those words in the bitmap to 1.
3. After that has been done for all characters in the c_set, check if the bitmap has all 1's. If yes, then the c_set covers the w_set else no.
A word bitmap? Do you mean a hashtable? That should work but seems the time complexity is worse than just traversing the w_set. Although for a fixed w_set, it can be treated as preprocessing. And it will also take a lot of space. Maybe there's better solution since the interviewer asked me to improve the time complexity.
I think Speeder meant to create a bitmap of size with # of words in the fixed word_set. Now, for each c in c_set you will get a list of corresponding words. But we need to update the bit-map with the position corresponding to the word in w_set. Speeder didn't mentioned how to find this position. Since, without this position this strategy will not work later in Step 3 when you are comparing for all 1's. If all are distinct words then it can be matched only with the total number but a word_set is supposed to contain duplicate words also. I think we can augment the inverted index records by maintaining the index positions of each word while creating the index. Such as, a->(apple,1); b->(ibm, 2); c->(cisco, 3); g->(google, 4). Now suppose, there is i in c_set, then we will maintain i->(ibm,1), (cisco, 2). Without this position information it would be difficult to process Step 3 as mentioned by Speeder. Now, with this info in inverted index you can basically arrange the word based on the corresponding bit position. In Step 3: check all the bit positions for 1 to be true, otherwise false.
Yes, that's more efficient than a hashmap. If the w_set is an array of words, the position is easy to get. The only problem is still we need to store a lot of duplicate mappings, like for apple in w_set {‘apple’, ‘ibm’, ‘cisco’, ‘google’} you'll need to store a->0, p->0, l->0, e->0 at the same time. Not sure whether it's the best way. But it does work.
We can just do the other way of creating a bit map of c_set. Now when ever we read the book. Pick each character and check if the bit is set in c_set. Where ever you find bit set move to next word. o(no of characters in the book) in worst case?
preprocessing()
1. Create a bit map for the C Set
2. Build a trie for the book
validateSet()
1. Starting at level 0 in the trie find all the words starting at level 0 that are found in the C Set i.e. first character is found in C Set and make a note of the indexes that are not found
2. For the indexes that are not found move to the next level
3. Keep on doing this until set of non found words become zero.
This algo is better as you need not look at all the words if a prefix is found you are good.
preprocessing()
1. Create a bit map for the C Set
2. Build a trie for the book
validateSet()
1. Starting at level 0 in the trie find all the words starting at level 0 that are found in the C Set i.e. first character is found in C Set and make a note of the indexes that are not found
2. For the indexes that are not found move to the next level
3. Keep on doing this until set of non found words become zero.
This algo is better as you need not look at all the words if a prefix is found you are good.
For the first problem I tried to use a bitmap for these characters instead of a hashtable and check iteratively for every word.
For the second problem, I only came up with a solution that get a minimum c_set for the w_set first and also store all the words in a hashtable with the the first character as the key so that we can check the not covered words iteratively. But seems the interviewer's not satisfied with my answer.
@zyfo2: For the follow-up: Do you mean to say for generating a c-set that covers w-set or to check the cover condition given a c_set (this is same as before).
If it for generating a c_set for a given fixed w_set, then we can take a greedy approach to find the first distinct character of each word of w_set and store it in c_set. It is more interesting if it is required to find a minimum size c_set for a given w_set.
Sorry for the confusion. Actually I mean, given a fixed w_set, and check for any given c_set whether it covers w_set. Just a reminder that there can be multiple distinct minimum c_set for a fixed w_set so just generating a minimum c_set is not the solution. Maybe you can generate all possible minimum c_set but I don't even know how to generate the minimum c_set.
if the problem cars only to find if for any c in c_set if there exists w in w_set such that c in w_set then
for ascii text, all we need to do is parse w_set and create a bit set of 26 chars (or 256 if we consider all ascii chars, assume array of 256 is init to 0 ) as soon as the bit set is all filled up (meaning all 256 locations are set to 1) stop processing w_set and simply iterate through each c in c_set and see if it is in ascii array.
Since we do not care for exact location and just if it "covers" it should be fast as in real world you will most probably will not parse whole w_set
I think I made it clear already. It means EVERY w in w_set contains SOME c in c_set. So for w_set {baby, boy, football}, c_set {b} and {a,o} both cover this w_set, but c_set {y} won't cover this w_set.
@zyfo2: why c_set {b, y} won't cover w_set {baby, boy, football}. everywork in w_set contains b which is in c_set
1. Prepare a c_set bitmap.
2. For each word, For each character in the word, compare with the c_set bitmap to find its match, if match found get the next word and repeat this step
3. If no match, return false;
4. If all the words are finished with scanning, return true;
Yeah, this is exactly what I did for the first problem and then the interviewer asked me to improve for the fixed w_set case. So I guess there're better solutions.
public static long setBit(long v, int i) {
return v | (1 << i);
}
public static long buildMask(char[] c) {
long mask = 0;
for(int i=0; i<c.length;i++) {
mask = setBit(mask, c[i]-'a');
}
return mask;
}
public static boolean covers(char[] c, String[] w) {
long mask = buildMask(c);
for(int i=0; i<w.length; i++) {
char[] wc = w[i].toCharArray();
long maskW = buildMask(wc);
if( (mask & maskW) == 0 )
return false;
}
return true;
}
Create a bitmap of c_set. size of this bitmap would depend upon on the Universal set of characters we are expecting and only those bits corresponding to characters in c_set will be set.
For each word in w_set create a bitmap similar to the one for c_set. Then if AND of bitmap(c_set) & bitmap(w in w_set) == bitmap(c_set). Then c_set covers w_set else continue with other words in w_set.
This method has one shortcoming.
It can not handle the situation when c_set contains characters more than once.
Complexity : O(n)
public class WsetCsetCovers
{
/**
* Is the wSet covered by the cSet?
* @param wSet word set
* @param cSet character set
* @return true if each word in the word set contains at least one char from the cSet
*/
public boolean isCovered(Set<String> wSet, char[] cSet)
{
boolean isCovered = true;
BitSet cBitSet = getBitSet(cSet);
List<BitSet> wBitSets = new LinkedList<BitSet>();
BitSet commonBits = null;
BitSet unionBits = null;
// cost O(w*n) where w is the size of wSet and n is the average characters of the wrods
for (String word : wSet)
{
BitSet wBitSet = getBitSet(word.toCharArray());
commonBits = buildCommonBitSet(commonBits, wBitSet);
unionBits = buildUnionBitSet(unionBits, wBitSet);
wBitSets.add(wBitSet);
}
if (isCovered(commonBits, cBitSet))
{
return true;
}
if (!isCovered(unionBits, cBitSet))
{
return false;
}
// cost is O(w)
for (BitSet wBitSet : wBitSets)
{
if (!isCovered(wBitSet, cBitSet))
{
return false;
}
}
return isCovered;
}
private BitSet buildUnionBitSet(BitSet unionBits, BitSet wBitSet)
{
if (unionBits == null )
{
unionBits = (BitSet) wBitSet.clone();
}
else
{
unionBits.or(wBitSet);
}
return unionBits;
}
private BitSet buildCommonBitSet(BitSet commonBits, BitSet wBitSet)
{
if (commonBits == null )
{
commonBits = (BitSet) wBitSet.clone();
}
else
{
commonBits.and(wBitSet);
}
return commonBits;
}
/**
* Do the two bit sets share at least a common bit set to 1 .
* @param bitSet1 bit set
* @param bitSet2 bit set
* @return tru if the two bit sets share at least a common bit set to 1
*/
private boolean isCovered(BitSet bitSet1, BitSet bitSet2)
{
return bitSet1.intersects(bitSet2);
}
/**
* Covers the char array to bit set
* @param chars char array
* @return bit set that whose bits are set to 1 using the char's int value as bit index
*/
protected BitSet getBitSet(char[] chars)
{
BitSet wBitSet = new BitSet(256);
for (Character myChar : chars)
{
wBitSet.set(myChar, true);
}
return wBitSet;
}
@Test
public void testCovered()
{
Set<String> wSet = new HashSet<String>();
wSet.add("google");
wSet.add("amazon");
wSet.add("test");
assertTrue(new WsetCsetCovers().isCovered(wSet, new char[]{'g', 'a', 'e'}));
assertTrue(new WsetCsetCovers().isCovered(wSet, new char[]{'g', 'a', 'e', 't'}));
assertFalse(new WsetCsetCovers().isCovered(wSet, new char[]{'g', 'a'}));
assertFalse(new WsetCsetCovers().isCovered(wSet, new char[]{'g', 'a', 'w'}));
}
@Test
public void testCommonCovered()
{
Set<String> wSet = new HashSet<String>();
wSet.add("abcd");
wSet.add("abef");
wSet.add("abwz");
assertTrue(new WsetCsetCovers().isCovered(wSet, new char[]{'a'}));
assertTrue(new WsetCsetCovers().isCovered(wSet, new char[]{'b'}));
assertTrue(new WsetCsetCovers().isCovered(wSet, new char[]{'a', 'b'}));
}
@Test
public void testUnionCovered()
{
Set<String> wSet = new HashSet<String>();
wSet.add("ab");
wSet.add("cd");
wSet.add("ef");
assertFalse(new WsetCsetCovers().isCovered(wSet, new char[]{'z'}));
assertFalse(new WsetCsetCovers().isCovered(wSet, new char[]{'w', 'z'}));
assertFalse(new WsetCsetCovers().isCovered(wSet, new char[]{'a', 'w'}));
assertFalse(new WsetCsetCovers().isCovered(wSet, new char[]{'a', 'b', 'c', 'w'}));
}
}
I assume c_set is a set of characters and w_set is a set of words? And basically c_set covers w_set means that all the words in w_set contains at least one character from c_set?
- kpj March 01, 2013One solution I can think of (similar to some of the mentioned solutions) is:
1. create 26-bit (using 4 bytes) bit mask where i-th bit is set if i-th character of the alphabet is in c_set. For example, if c_set is {a, b, c, g}, it is like 11100010000.... let's call it cmask
2. For each word, create a similar mask. For "apple", bits for a, p, l, e are set. Let's call it wmask_i for w_i.
Then compute bitwise AND with cmask and each wmask_i. If any of them evaluates to 0, stop there and return false. If every cmask & wmask_i is non-zero, c_set covers w_set.
I can't really think of any good solutions for the second part. Small optimizations that I can think of are:
1. precompute wmask_i.
2. wmask_1 & wmask_2 & ... & wmask_n indicates characters that appear in all of the words. If cmask & this w_mask is non-zero, c_set covers w_set
3. wmask_1 | wmask_2 | ... | wmask_n indicates all the characters that appear in w_set. If cmask & this wmask is zero, c set doesn't cover w_set.