showell30@yahoo.com
BAN USER- 0of 0 votes
AnswersCode up a simple Bloom Filter for four letter words. Assume all words are lowercase, exactly four letters long, and only use the letters a to z. An example set of test words might be able, band, bend, card, darn, form, grab, harm, look, make, mate, peer, and team, but assume the actual list of words is much bigger.
- showell30@yahoo.com in United States| Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Data Structures
@goutham467 BTW here is a way to solve this that explicitly uses a comparison function with a generic algorithm:
def max_by(lst, cmp):
candidate = 0
for i in range(1, len(lst)):
if cmp(lst[candidate], lst[i]) > 0:
candidate = i
return lst[candidate]
def find_celebrity(lst):
# Simplifying assumption: assume the most famous
# person is indeed a celebrity.
return max_by(lst, famous_cmp)
def famous_cmp(i, j):
if know(i, j):
return 1
else:
return -1
def know(i, j):
# 40 is our celebrity
if i == 40:
return False
if j == 40:
return True
# For others, return something
# predictable
return i == j + 10 or i == j - 10
print find_celebrity([10, 20, 30, 40, 50])
I omitted the extra step to fully verify celebrity status.
- showell30@yahoo.com February 28, 2013@goutham467, Instead of thinking of this as a sorting problem, think of it as a min-item-in-list problem. You can easily find the minimum element of an unsorted list in linear time, as long as the list has some sort of transitive ordering. The twist here is that you don't have a truly transitive ordering, since A can know B can know C can A. From the celebrity's standpoint, though, it doesn't really matter, because he or she is not allowed to be in any strange friendship cycles.
When you solve the traditional min-item-in-a-list problem, you have a provisional min and you keep trying to dethrone the provisional min as you iterate through the list. For the celebrity problem, it's the same algorithm with a slightly different mechanism to dethrone the provisional celebrity.
@duckling One possible optimization here is that we try to eliminate candidates more aggressively. When you have i and candidate, call both KNOW(i, candidate) and KNOW(candidate, i). If both know each other or both don't know each other, then you've eliminated both, and you can advance candidate to be i+1 and i to be i+2. I haven't thought this all the way through, and it might not be worth the trouble, but if you can get all the bookkeeping right, it might eliminate the final linear pass and possibly speed up the first linear pass, especially if the KNOW() function call is expensive for some reason.
- showell30@yahoo.com February 28, 2013The array of coin denominations could be arbitrarily long, which makes a DP solution a bit tricky to orchestrate. Suppose you have pennies, nickels, dimes, quarters, and half dollars. Then you have a five-dimensional matrix of coin combinations.
- showell30@yahoo.com February 28, 2013Although it's overkill for this problem, you can model this like an NCAA March Madness single-elimination, which runs in linear time. For each "game," see if A knows B. If A knows B, then B wins, else A wins. After 63 games, you will have a winner that is least as "famous" as all other contenders. To verify that they're actually a celebrity, you'd still need to check the winner's relationship to the 63 losers in another linear pass.
- showell30@yahoo.com February 28, 2013@duckling That's a nice refinement of my solution. As your code illustrates, there's no need to actually perform a swap; just change the value of candidate to be the new i.
- showell30@yahoo.com February 28, 2013Here's example Python code:
def ways_to_make_change(amount, coins):
# Assumption: coins[0] should be the smallest
# denomination for optimal performance.
solution = [0] * len(coins)
amount_needed = amount
while True:
# To generate each new solution, we increase
# at most one of the denominations and roll
# the odometer back to zero for any smaller
# coins.
which_coin = 0
while True:
if coins[which_coin] > amount_needed:
for i in range(which_coin+1):
amount_needed += solution[i] * coins[i]
solution[i] = 0
which_coin += 1
if which_coin >= len(coins):
return
else:
if which_coin == 0:
coins_needed = amount_needed / coins[which_coin]
else:
coins_needed = 1
amount_needed -= coins_needed * coins[which_coin]
solution[which_coin] += coins_needed
if amount_needed == 0:
yield solution
break
coins = [1, 5, 10, 25]
amount = 27
for solution in ways_to_make_change(amount, coins):
print solution
If you have a list sorted by A, then iterate through the list and find runs of common A values and then sort them in place by B. The only problem here is that some languages might not have a handy library method that can sort a subrange of a list. If that's the case, I would just code up my own version of in-place quicksort that takes a list and a range.
- showell30@yahoo.com February 28, 2013Treat this like an odometer problem. Start with zero of each coin. Increment pennies. If pennies takes you past n, then zero pennies and increment nickels. If nickels take you past n, then zero pennies and nickels and increment dimes. If dimes take you past n, then zero pennies/nickels/dimes and increment quarters. If quarters take you past n, then you're done. Each roll of the odometer starts on pennies again.
Once you get a basic odometer working, refine the increment step to be more aggressive. For example, if you are looking at nickels and you're 15 short, jump right ahead to incrementing 3 nickels. If you're 14 short, jump right ahead to two more nickels, and then continue the loop to get more pennies.
Nice, straightforward solution with clean code. An alternative solution is to process the string in reverse order, which makes the CM scenario become a sign flip on C instead of a double subtraction.
- showell30@yahoo.com February 28, 2013@sonesh, are you assuming that subsequences can be non-contiguous? If the two strings are "X****Y***Z" and "X______Y__Z", I think your method will return 3, which is valid under the non-contiguous interpretation.
- showell30@yahoo.com February 28, 2013@bunnyabare, According to the rules, there can be at most one celebrity. Proof: Suppose A and B are both celebrities. A knows B by virtue of B's celebrity and the rule that a celebrity is known by everybody. But then A can't be a celebrity, by virtue of the rule that celebrities can't know anyone. Since any pair of celebrities leads to a contradiction of the rules, there can be at most one celebrity.
- showell30@yahoo.com February 28, 2013There can be at most one celebrity. Suppose A and B are both celebrities. A knows B by virtue of B's celebrity. But then A can't be a celebrity, by virtue of the rule that celebrities can't know anyone. Given that you have only celebrity, you can use a linear algorithm. Start with person 1, and they're the provisional celebrity. With person 2, see if either 2 doesn't know 1 or 1 knows 2, and if so, 1 is no longer a celebrity, and swap 1 and 2. For each new person, try to see if you can eliminate the celebrity's status with two simple checks, and then swap as needed.
After the first linear pass, you'll have a provisional celebrity, and then take another linear pass to verify that he really is a celebrity.
If there are no celebrities, then the second pass will be terminate fairly quickly.
For the concrete case, use a little number theory in advance of coding:
6-packs + 9-packs only -> 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, etc.
add a 20-pack -> 26, 29, 32, 35, 38, 41, 44, 47, 50, 53, 56, 59, etc...
add two 20-packs -> 40, 46, 49, 52, 55, 58, etc.
Basically, if you want more than 60 nuggets, you can always get your exact nugget meal.
For values less than 60, just hard code a lookup table or use the recursive algorithm, which is efficient enough for small numbers despite exponential time.
The only unreachable numbers are 1,2,3,4,5,7,8,10,11,13,14,16,17,19, 22, 23, 25, 28, 31, 34, 37, and 43.
Michael's being a little bit snarky, but if we're gonna assume our computer's too crippled to do multiplication, why wouldn't we break out the slide rule? ;)
- showell30@yahoo.com February 27, 2013This is a fine answer, and it's basically using "#" as a sentinel for null. If your binary tree contained characters, and "#" was a valid value, you'd need to escape values of "#" to make this work.
- showell30@yahoo.com February 27, 2013This code is a bit dangerous, because it mutates polynomial1. Here's your failing test:
assert {0:5, 3:7, 5:35} == sum(polynomial1, polynomial2)
assert polynomial1 == {0:5,5:12}
I thought of linked lists too, but in the context of this question, I'm not sure I could justify them over normal lists, even for the sparsed exponent assumption. The advantage of linked lists over contiguous lists is that you can insert into the middle of a linked list in O(1) time, but you don't need that for adding polynomials. Am I missing something?
- showell30@yahoo.com February 27, 2013Actually, 2 children vs. 3 children is gonna be very close, and I think a 3-child scheme may even be better. You can estimate the total cost with code like this:
def compute_time2(m, n):
if m*2 > n:
return 0
else:
# mostly accurate
return compute_time2(m*2, n) + 2
def compute_time3(m, n):
if m*3 > n:
return 0 # no children
else:
return compute_time3(m*3, n) + 3
def compute_time15(m, n):
if m*15 > n:
return 0 # no children
else:
return compute_time15(m*15, n) + 15
print compute_time2(1, 80000)
print compute_time3(1, 80000)
print compute_time15(1, 80000)
Oops, my description of the three-way scenario is a little off, but hopefully you get the idea. When each parent node has to do more work, the cost of the later computations goes up. For large numbers of N, you are effectively comparing 2log2(N) vs. 3log3(N). Even though log3(N) is always less than log2(N) for N > 1, it's greater than 2/3log2(N) for any large value of N.
- showell30@yahoo.com February 27, 2013@Kevin, My solution only makes sense for a distributed computation where the actual computation is expensive. Assume for the sake of the exercise that network latency is negligible compared to the cost of the computation, and that summing two numbers takes a full second of computation time.
Even for a relatively expensive computation, doing a binary tree is way overkill for small values of N. For a small number of nodes, you're best off just having all nodes send their value to a single master node and let the master node do its thing.
But let's assume N=12, and let's assume a really fast network. With a binary tree, the nodes will take this many seconds before their computation is finished
12: 0
11: 0
10: 0
9: 0
8: 0
7: 0
6: 1 (only adds 6 and 12 values)
5: 2
4: 2
3: 3 (has to wait for 6 before adding 6 to 3+7)
2: 4 (has to wait on 4 and 5)
1: 5( gets 3 at t3, adds 3 to itself, then gets 2 at t4)
That makes for a total of five seconds.
Now let's set it up so that nodes have three children:
1 is parent of 3, 4, 5
2 is parent of 6, 7, 8
3 is parent of 9, 10, 11
4 is parent of 12
Nodes 2, 3, and 4 start their computation immediately, but they have three sums to perform (themself + first child + second child + third), so they're busy a full three second. So they all finish at t=3 and send their value immediately to node 1. When node 1 gets those three values, it takes node 1 another three seconds to compute its sum. So the total running time is six seconds.
By giving the lower-id nodes more children, we've made them the bottleneck, undermining the power of a distributed system.
If you can have an arbitrary set of variables, then represent monomials like this:
(((a,4),(x, 3), (y,2)), 5) # 5a**4x**3y**2
Represent polynomials as an array of monomials, sorted lexographically.
To add polynomials, merge the two lists starting from the front. You can easily determine if two monomials have the same combination of variables, and, if so, you combine them, adding the coefficients.
Python Recursive algorithm with Caching.
def strategy(pots):
cache = {}
def s(i, j):
# Given: pots from i to j
# Assuming rational players, return this tuple
# to represent your strategy:
# only, L, R
# how much gold in this pot
# 1st player bounty
# 2nd player bounty
# rest of the game
if i == j:
return ('only', pots[i], pots[i], 0, None)
if (i,j) in cache:
return cache[(i,j)]
l_next = s(i+1, j)
r_next = s(i, j-1)
_, _, l1, l2, _ = l_next
_, _, r1, r2, _ = r_next
l_outcome = pots[i] + l2
r_outcome = pots[j] + r2
if l_outcome > r_outcome:
result = ('L', pots[i], l_outcome, l1, l_next)
else:
result = ('R', pots[j], r_outcome, r1, r_next)
cache[(i,j)] = result
return result
return s(0, len(pots)-1)
pots = [6, 2, 3, 4, 7]
print pots
print strategy(pots)
Another serialization:
tree_rep:
_ => null node
:<tree_rep left>,<serialize value>,<tree_rep right> => regular node
If your tree contains integers, you might have something like this:
:_,2,:_,3,:_,4,_
Serialize a NULL tree as zero bytes.
Otherwise:
- Serialize the root.
- Serialize 0, 1, 2, or 3 based on whether subtrees exists (0=neither, 1=left only, 2=right only, 3=both)
- Serialize both subtrees
When it comes time to deserialize the tree, the 0/1/2/3 will prevent recursive calls from slurping up data that should belong to the parent nodes:
If no bytes left, return NULL tree.
Read node value.
Read mask.
If mask == 0, return control to caller.
If mask == 1, recursively read left subtree then return.
If mask == 2, recursively read right subtree then return.
If mask == 3, recursively read both trees then returns.
An example serialization might be A3B0C1D2E0, which maps to B <- A -> ((D -> E) <- C).
A good optimization here is to use the smaller of a and b as the loop index (i.e. swap a and b if b < a). For large values of a/b, you can optimize this even further by doing it in log(N) time if the interview allows you bit shifts. Something like this:
if a == 0: return 0
if a == 1: return b
half_a = a >> 2
rem_a = a % 2
sum_half_a = sum(half_a, b)
if rem_a == 1:
return b + sum_half_a + sum_half_a
else:
return sum_half_a + sum_half_a
Upvoted. The way the question is question is phrased, I think the interviewer is looking for an answer like yours. If I were the interviewer, my follow up question would be something like this: "Now suppose you're given a very large dataset that's already sorted by field A, but now you want to use B as a tiebreaker..."
- showell30@yahoo.com February 26, 2013Another option for sparse polynomials is store a list of tuples ordered by the exponent of x.
def sum_poly(poly1, poly2):
def sum(i1, i2):
if (i1 >= len(poly1)):
return poly2[i2:]
if (i2 >= len(poly2)):
return poly1[i1:]
p1, c1 = poly1[i1]
p2, c2 = poly2[i2]
if p1 == p2:
return [(p1, c1+c2)] + sum(i1+1, i2+1)
elif p1 < p2:
return [(p1, c1)] + sum(i1+1, i2)
else:
return [(p2, c2)] + sum(i1, i2+1)
return sum(0, 0)
poly1 = [(0, 5), (4, 3)] # 3x**4 + 5
poly2 = [(0, 2), (2, 7)] # 7x**2 + 2
assert [(0, 7), (2, 7), (4, 3)] == sum_poly(poly1, poly2)
@chenic626 All three of these options can be done in constant space and NlogN time. I think option 2 is gonna make the most sense for most people, and most languages have a fairly idiomatic way to sort with a compare function or key function that looks at two fields. I don't understand your Ka/Kb notation, so maybe I'm missing something.
- showell30@yahoo.com February 25, 2013@sonesh: Try coding this with a1b1c1. When you write c to the final output position, you're going to overwrite b.
- showell30@yahoo.com February 25, 2013@sonesh: Consider the tree 1 <- 2 -> 3. If you want to find numbers in the range 1 to 3 (inclusive), then your algorithm will first find 1 as the root. An inorder traversal with 1 as the root won't get you back to 2 and 3, unless you have parent pointers or a stack. It's really that simple.
- showell30@yahoo.com February 25, 2013Python Lazy Dictionary Approach.
We lazily build a dictionary of words from s1 as we try to find matches in s2.
# Find the longest substring present in two different strings.
# This is kind of an overkill approach.
def test():
s1 = "cbalaskdfjthe_answerlkounot_quitevo_lajslkj"
s2 = "alnot_quitesoiuthe_answerjoiunoiuhkbiwi"
assert "the_answer" == match(s1, s2)
def match(s1, s2):
# This helper function takes a list of indexes
# in a string and creates a dictionary whose
# keys are letters and whose values are the
# list of indexes immediately to the right
# of occurrences of the letter in s1.
def build_dict(subt):
dct = {}
for i in subt:
if i < len(s1):
c = s1[i]
if c not in dct:
dct[c] = []
dct[c].append(i+1)
return dct
# We lazily build a tree of all words in s1.
# We only go one letter deep until we encounter
# the letter in s2
tree = build_dict(range(len(s1)))
max_word = ''
for i in range(len(s2)):
word = ''
t = tree
for j in range(i, len(s2)):
c = s2[j]
if c not in t:
break
word += c
subt = t[c]
if type(subt) == list:
dct = build_dict(subt)
t[c] = dct
t = t[c]
if len(word) > len(max_word):
max_word = word
return max_word
test()
@Vinod, Read the awkwardly worded question again and you'll see that's it's a valid interpretation to find just one element in the range. It says "print the node," which suggests you're just trying to find one match.
@Rahul, The long thread on this page that starts with "find the first one" has a detailed discussion of the running times for two different implementations, using the interpretation that you have to report all values in the range, not just one.
An optimal solution should produce the next string with O(k) operations. It takes a minimum of k operations to persist the letters of the string, so you can't do better than O(k), and determining where to get each letter should be constant time.
The total running time should be proportional to k times the product of all the string lengths.
I like to use the odometer approach here, because it's very straightforward to analyze the running time.
Python Iterative Search.
This code avoids recursion by maintaining its own stack. It's an inorder traversal of the tree, but it only yields elements that are within range, and it short circuits traversing the parts of the tree that are out of range, as well as short circuiting stack pushes for parts of the tree that are more than max.
def traverse_range(tree, min, max):
stack = []
while tree or stack:
if tree:
left, val, right = tree
if val <= max:
stack.append(tree)
tree = left
if val < min:
tree = None
else:
tree = stack.pop()
left, val, right = tree
if val > max:
return
if min <= val:
yield val
tree = right
There's a PEP called "Syntax for Delegating to a Subgenerator" that's been accepted in Python, but I'm not sure if they actually optimise for the delegated-yield issue.
- showell30@yahoo.com February 24, 2013Ok, I think I see what you mean. Let's say you have a pretty deep tree, say 1M elements and about 20 deep. All the elements in the bottom layer require 20 touches before they're finally yielded. Python does have a "yield from" in recent versions, I believe, that prevents this problem. In earlier versions, I would just change the code to pass in a visitor function to collect the elements.
- showell30@yahoo.com February 24, 2013There is no multiplicative factor for height in my algorithm. Even in the worst-case scenario, where the tree is completely vertical, it visits every node once, for a running time of 2H. I have no idea where you're getting the H-squared from.
People often implement BSTs without parent pointers to save space and maintenance overhead. Without parent pointers, it's tough to write a successor function. With parent pointers and/or some other storage scheme that allows for a reasonable successor function, I see the merits of your solution.
Python Odometer Solution:
Using an odometer-based approach is arguably overkill for this problem, but it's a good technique to have in your bag of tricks. An odometer function yields the cartesian product of a bunch of integer ranges. It comes up in other problems. For example, suppose you have a prime factorization for a number, and you want to generate all the composite factors of the number. You can use an odometer to generate all the combinations of exponents.
def test():
strings = [
'abcd',
'123',
'XY'
]
for s in cartesian(strings):
print s
def cartesian(strings):
max_indexes = map(len, strings)
for indexes in odometer(max_indexes):
letters = [strings[i][j] for i, j in enumerate(indexes)]
yield ''.join(letters)
def odometer(max_indexes):
odometer = [0] * len(max_indexes)
def advance_odometer():
i = len(odometer) - 1
while i >= 0:
odometer[i] += 1
if odometer[i] < max_indexes[i]:
return True
else:
# roll over to next index
odometer[i] = 0
i -= 1
if i < 0:
return False
while True:
yield odometer
if not advance_odometer():
break
test()
Python recursive solution:
def cartesian(lst):
if not lst: return
def f(i, prefix):
if i == len(lst) - 1:
for c in lst[i]:
yield prefix + c
else:
for c in lst[i]:
for s in f(i+1, prefix+c):
yield s
for s in f(0, ''):
yield s
strings = [
'abcd',
'123',
'XY'
]
for s in cartesian(strings):
print s
@Anonymous, of course there are occasions where you traverse both subtrees. Nothing in my answer implied otherwise. If @sonesh's approach is easy, then somebody should post the code.
- showell30@yahoo.com February 23, 2013Here is a working example of code that doesn't require any special operations from a binary tree other than "right" and "left."
def test():
lst = [1,2,3,4,4,5,6,7,7,7,8,9,10,11,12,13,14,15]
tree = create_tree(lst)
assert [4,4,5,6,7,7,7,8,9,10,11,12] == list(traverse_range(tree, 4, 12))
assert [] == list(traverse_range(tree, 16, 100))
assert lst == list(traverse_range(tree, 0, 100))
def traverse_range(tree, min, max):
def traverse(t):
if t is None:
return
left, root_val, right = t
if min <= root_val:
for v in traverse(left):
yield v
if min <= root_val <= max:
yield root_val
if root_val <= max:
for v in traverse(right):
yield v
for item in traverse(tree):
yield item
# In Python a simple way to represent a binary
# tree is as a tuple: (left_tree, val, right_tree)
def create_tree(lst):
def _maketree(low, high):
# Making a balanced tree is simply a matter
# of taking the middle value as the root
# and recursively building the subtrees
# on the left and right.
if low > high:
return None
if low == high:
return (None, lst[low], None)
mid = (low + high) / 2
val = lst[mid]
left = _maketree(low, mid - 1)
right = _maketree(mid + 1, high)
return (left, val, right)
return _maketree(0, len(lst) - 1)
test()
@Anonymous, the recursive calls to traverse make it so that you output _all_ the nodes. Change "output" to "visit" or "yield" to be more general.
- showell30@yahoo.com February 23, 2013I've edited my answer to say "Assuming the tree is balanced."
- showell30@yahoo.com February 23, 2013The problem with inorder traversals is that it's O(N), and this problem can be solved in O(log N) pretty easily. If you do go with the inorder traversal approach, which has the merit of simplicity, you just need to keep track of the value of the last node you visited, and you're done.
- showell30@yahoo.com February 23, 2013@sonesh, Have you thought about how you are going to do an inorder traversal starting from the middle of the tree? Are you going to have parent links or maintain kind of stack? Try coding this and I bet you'll see why this is a much better solution:
if tree is null, you're done
if min <= root, traverse left
if min <= root <= max, output root
if root <= max, traverse right
It takes 10 times as long to increment a 10-word-long number as it does to increment a 1-word long number, which is why this algorithm is O(log(log(N)). On a 32-bit machine, you can represent the longest run of 1s in your number in a 32-bit word if the number is less than 2**(2**32). After that you need multi-words. It's rare that operations are truly O(1). Even "normal" addition is O(number of bits in the word), but for practicality sake, we we call it O(1) with the constraint that we'll never exceed our word size.
- showell30@yahoo.com February 23, 2013What I call "successor" is actually the predecessor. The code works; I was just sloppy with the names.
- showell30@yahoo.com February 23, 2013C Solution with Tests:
The mildly fiddly part here is that a node's predecessor can be either a left child or a parent. Otherwise, it's basic recursion.
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
struct search_result {
int found;
int val;
};
struct tree {
int val;
struct tree *left;
struct tree *right;
};
struct search_result biggest(struct tree *tree) {
struct search_result sr;
if (!tree) {
sr.found = 0;
return sr;
}
if (tree->right) return biggest(tree->right);
sr.found = 1;
sr.val = tree->val;
return sr;
}
struct search_result successor(struct tree *tree, int val) {
struct search_result sr_not_found;
sr_not_found.found = 0;
if (!tree) {
return sr_not_found;
}
if (tree->val == val) {
return biggest(tree->left);
}
if (tree->val < val) {
if (!tree->right) return sr_not_found;
if (tree->right->val == val && !tree->right->left) {
if (tree->right->val != val) return sr_not_found;
struct search_result sr;
sr.found = 1;
sr.found = 1;
sr.val = tree->val;
return sr;
}
return successor(tree->right, val);
}
else {
return successor(tree->left, val);
}
}
struct tree *make_node(int n) {
struct tree *tree = malloc(sizeof(struct tree));
tree->left = NULL;
tree->right = NULL;
tree->val = n;
return tree;
}
int main(int argc, char **argv) {
struct search_result sr;
sr = successor(NULL, 3);
assert(!sr.found);
struct tree *t1 = make_node(1);
struct tree *t2 = make_node(2);
struct tree *t3 = make_node(3);
struct tree *t4 = make_node(4);
struct tree *t5 = make_node(5);
struct tree *t6 = make_node(6);
t2->left = t1;
t2->right = t3;
t4->left = t2;
t4->right = t5;
t5->right = t6;
sr = successor(t1, 1);
assert(!sr.found);
sr = successor(t2, 2);
assert(sr.found && sr.val == 1);
sr = successor(t2, 3);
assert(sr.found && sr.val == 2);
sr = successor(t4, 5);
assert(sr.found && sr.val == 4);
sr = successor(t4, 4);
assert(sr.found && sr.val == 3);
sr = successor(t4, 3);
assert(sr.found && sr.val == 2);
return 0;
}
I think the interviewer here probably wants a simple solution like this:
I would start with this before exploring more exotic data structures and algorithms.
- showell30@yahoo.com February 28, 2013