Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
Sorry, the returned value for maybeInSet should be computed:
return (table | hash(str)) == table;
All 4 chars fit in an integer, so:
int hash(char * str) {
int h = ((int)char[0]) << 24;
h |= ((int)char[1]) << 16;
h |= ((int)char[2]) << 8;
h |= ((int)char[3]) << 0;
return h;
}
So the trick of the Bloom filter is that you only want each word to set a few bits in the filter. If each word leads to an average of ten bits being set in the filter, it doesn't take too many words until every bit in the shared bitmap is 1, at which time you effectively get positive guesses every time, which means way too many false positives.
// Here's a Java implementation
static int MAX_BYTE = 488;
static int MIN_BYTE = 388; // i.e., "aaaa"
static int[] keys = new int[100]; // used because MAX_BYTE - MIN_BYTE
static String[] words = {"able", "band", "bend", "card", "darn", "form", "grab",
"harm", "look", "make", "mate", "peer", "team"};
public static void main(String[] args)
{
createBloom(words);
System.out.println("may contain: " + "abbe" + mayContain("abbe"));
}
public static void createBloom(String[] values)
{
for(int i = 0; i < values.length; i++)
{
int key = getKey(values[i].getBytes());
if(keys[key - MIN_BYTE] == 0)
{
keys[key - MIN_BYTE]++;
}
}
}
public static int getKey(byte[] bit)
{
int out = 0;
for(int i = 0; i < bit.length; i++)
{
out += bit[i];
}
return out;
}
public static boolean mayContain(String query)
{
int keyCheck = getKey(query.getBytes()); // find query key
return keys[keyCheck - MIN_BYTE] == 1;
}
I didn't know what a blossom filter was. I just learnt a few things and I want to give it a try. Would this make sense?
Given that we are working with ASCII from 97 to 122, we can consider that the changing bits for each letters are the 5 least-significant (the ones to the right in a little-endian machine).
So we can use an array of 20 bits and just assign directly.
}
- Anonymous March 16, 2013