showell30@yahoo.com
BAN USER- 0of 0 votes
AnswersCode up a simple Bloom Filter for four letter words. Assume all words are lowercase, exactly four letters long, and only use the letters a to z. An example set of test words might be able, band, bend, card, darn, form, grab, harm, look, make, mate, peer, and team, but assume the actual list of words is much bigger.
- showell30@yahoo.com in United States| Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Data Structures
@tpcz, How is this problem not inherently exponential? The numbers from positions i to j can generate 3 ^ (i+j-1) different sums, depending on where you put the minus signs and/or omit numbers. How does your DP matrix look for the use case of target 12 and elements 9999, 9998, 11? Aren't there 27 booleans at play here?
- showell30@yahoo.com March 18, 2013I originally downvoted this question, because it's pretty poorly specified, but I upvoted it back. The author doesn't really make it clear what constitutes a "distinct" solution. For example, why is 6 + 8 - 2 = 12 considered a solution, but 8 + 6 - 2 = 12 isn't? Are we allowed to use a negative sign in front of the first element? How about other operators? If the target number is 0, then can we say that our solution is to add up none of the numbers?
My inference is that the problem boils down to this. For each number, it can either contribute itself to the sum, its negative to the sum, or nothing to the sum. So there are 3^N possibilities to consider. I would do simple recursion with these pseudo-coded relationships:
f([], 0) = 1
f([], a) = 0 whenever a is not zero
f([x|rest], n) = f(rest, n+x) + f(rest, n) + f(rest, n-x)
In my notation [x|rest] means you have a list of at least length 1 where x is head and "rest" denotes the remainder of the elements in the list. I think the problem is ambiguously defined if either your target is zero or your list is empty, but setting up the degenerate cases correctly makes the recursion work.
- showell30@yahoo.com March 18, 2013Here's the implementation, by the way:
def test():
# test the string from the web page
s = '986561517416921217551395112859219257312'
assert 36 == get_equal_sum_substring(s)
# test with a longer string
s = '9' + ('5' * 8000) + '9' + ('6' * 7999) + '8'
assert 8002 == get_equal_sum_substring(s)
def get_equal_sum_substring(s):
s = map(int, s)
S = [0]
sum = 0
for i in s:
sum += i
S.append(sum)
def sub_sum(low, high):
return S[high] - S[low]
# Set slen to be provisional length of half of
# the substring. We start with the highest possible
# value of slen so that we can return as soon as
# find and equal-sum substring.
slen = len(s) // 2
while slen > 0:
for i in range(len(s)-2*slen):
lsum = sub_sum(i, i+slen)
rsum = sub_sum(i+slen, i+slen*2)
if lsum == rsum:
return 2 * slen
slen -= 1
return 0
test()
@Anonymous, If you do center-inward the naive way, then it's no better than the center-outward approach, because you're not traversing the sums in a way that allows you to terminate early. Now you can do a center-inward search in a breadth-search manner, and keeping the outward sums would allow you to quickly compute the next-inward sums when you got back to them in a breadth search, but then you're facing a lot of work. The whole idea of caching the long sums so that you could subtract off elements to get the next-longest sums, rather than recomputing a long sum, is what got me to the S() function epiphany in the first place. If you use the S() function, then you get all these advantages:
1) You only ever compute n sums.
2) You can iterate through the results in descending order.
3) To evaluate each potential result you merely compute a difference between two S() values in constant time.
4) You don't have to maintain any complicated data structures. It's just O(N) storage for the values of S().
Python Solution for Subsequence (non-contiguous)
Many of the solutions on this page are solving the trivial problem of contiguous runs in a list. A subsequence can actually have gaps, which makes this a bit more challenging, but you can do it NlogN time. My solution is actually N-squared, but the linear search can be fairly trivially converted to a binary search.
# This solves the LSS problem--longest sorted subsequence.
def test():
a = [7,8,1,5,4,6,9]
assert LSS(a) == [1,4,6,9]
def LSS(a):
if len(a) == 0:
return []
# candidate_maxes is an array of
# the max value of a sorted subsequence
# of length n, where n is the index
# of the current element
candidate_maxes = [a[0]]
predecessor = {}
for elem in a[1:]:
if elem > candidate_maxes[-1]:
# This is the case where our new element
# extends the longest subsequence so far.
predecessor[elem] = candidate_maxes[-1]
candidate_maxes.append(elem)
else:
# Do a linear search to see if we
# can make one of our previous subsequences
# easier to extend. For a large N, we could
# convert this to a binary search.
for i in range(len(candidate_maxes)):
if elem < candidate_maxes[i]:
candidate_maxes[i] = elem
if i > 0:
predecessor[elem] = candidate_maxes[i-1]
break
i -= 1
elem = candidate_maxes[-1]
result = [elem]
while elem in predecessor:
elem = predecessor[elem]
result.append(elem)
result.reverse()
return result
test()
Isn't there a race condition between releasing id2's read lock and opening id2's write lock? During that time somebody else could adjust a2's balance, only to have it wiped out by the assignment to temp.
- showell30@yahoo.com March 18, 2013Good luck! I came upon the clever shortcut trying to think along DP lines--basically, structuring your computations so that recursive relations make things really simple--but I really struggled for a while. I think I've probably seen this technique in some other context, because it was obviously lurking in my subconscious. In some ways it's analogous basic Calculus--compute the integral function so that computing the sum of diffs across a range becomes simple subtraction.
- showell30@yahoo.com March 18, 2013Like I said, I think the problem is a bit contrived. For a really simple use case, it's probably more than sufficient to simply synchronize TransferAccount, as you're probably gonna have bottlenecks elsewhere. And as soon as you start introducing real world problems like connecting to databases that might be down, etc., the problem gets more complex than simply ordering locks.
But you're right, I'm a little off topic. :) I just upvoted the lock ordering solution.
Python Successor Approach
This is a mostly iterative approach toward generating the permutations. It introduces a successor function, which perhaps simulates better how a human would approach the problem--we have a number, and we want to find the next viable candidate.
# Generating permutations is a fun challenge,
# but it's also nice to have a successor function
# that can just give you the next permutation.
def permutations(num_digits):
n = 0
for i in range(num_digits):
n = n * 10 + i
bitmap = make_bitmap(num_digits, n)
while n:
yield n
n = successor_helper(num_digits, n, bitmap)
# This is for one stop shopping.
def successor(num_digits, n):
bitmap = make_bitmap(num_digits, n)
return successor_helper(num_digits, n, bitmap)
# If you want successive permutations, call the
# successor_helper with a bitmap.
def successor_helper(num_digits, n, bitmap):
ones = n % 10
tens = n / 10
bitmap[ones] = False
for ones in range(ones+1, 10):
if not bitmap[ones]:
bitmap[ones] = True
return tens * 10 + ones
# we overflowed
if num_digits == 1:
return None
tens = successor_helper(num_digits-1, tens, bitmap)
if not tens:
return None
for ones in range(10):
if not bitmap[ones]:
bitmap[ones] = True
return tens * 10 + ones
return None
def make_bitmap(num_digits, n):
bm = [False] * 10
for i in range(4):
bm[n % 10] = True
n /= 10
return bm
def test():
def test_successor(n, exp_new_n):
bm = make_bitmap(4, n)
new_n = successor_helper(4, n, bm)
assert new_n == exp_new_n
assert bm == make_bitmap(4, new_n)
test_successor(123, 124)
test_successor(132, 134)
test_successor(139, 142)
test_successor(189, 192)
test_successor(198, 213)
test_successor(1987, 2013)
assert 213 == successor(4, 198)
assert successor(4, 9876) is None
assert 9 * 10 == len(list(permutations(2)))
assert 7 * 8 * 9 * 10 == len(list(permutations(4)))
test()
@Anonymous, yeah, the only drawback of the centers-outward approach is that you can't start by evaluating the longest possible candidates first, and therefore exiting early as soon as you find a match. If the likelihood of palindromes is fairly high, it's a significant benefit to proceed in reverse order. For the centers-outward approach, you are basically committed to evaluating all N*(N-2) sums. If the likelihood of a palindrome is reasonably high, this is inefficient.
With the precompute solution, your best-case scenario (the whole string is equally balanced) will run in linear time.
P.P.S. See my response to @tpcz for a more efficient way to speed up the sum-of-range computation than using a cache.
- showell30@yahoo.com March 17, 2013@sonesh, I'm joking. I'm playing off the pseudo-misspelled-as-sudo pun. Don't blame me, Anonymous started it. :)
- showell30@yahoo.com March 17, 2013A simple way to make the sum-of-a-range computation work in constant time is to precompute the function S(n) for all values n in linear time, where S(n) represents the sum of all elements to the left of an element:
S(n) = a[0] + a[1] + a[2] + ... + a[n-1]
so S(n) = S(n-1) + a[n-1]
Then to compute the sum of the range a[low:high], you take S(high) - S(low), using the Python convention that high is the first number NOT in the range.
P.S. In actual wall time it's hard to outperform the naive algorithm with a caching strategy, because the "constant" time of a cache hit is significantly higher than just summing up numbers over a small range. It's also tricky to optimize for cache hits, because your strategy for breaking down sums can lead to overlapping ranges, so you would probably prefer some kind of DP-ish strategy that primes the cache, or use some sort of skip-list scheme.
- showell30@yahoo.com March 17, 2013It's N cubed. You have two outer loops around an O(N) sum. To make this faster you need to avoid recomputing partial sequences, by using a cache or some kind of precomputed matrix (DP). See "Python Solution with Cache" for a caching approach. N cubed gets slow real fast. Try coding your naive algorithm for N=1000.
- showell30@yahoo.com March 17, 2013A super naive algorithm is to count from 0000 to 9999, and just skip over any number with duplicate digits. Over half of the numbers (5040) are valid candidates, so it's not horribly wasteful. Then you think about it some more, and you realize every number from 7700 to 7799 is a "bad" number, so you can detect numbers of the form nn00, and immediately skip past 100 values. Pretty soon you'll converge on what's essentially an odometer algorithm. When you roll the odometer, instead of incrementing the current digit, increment to the next digit not already used by a higher digit. You can micro-optimize digit detection by having a length ten map of which numbers are available.
- showell30@yahoo.com March 17, 2013The correctness argument goes something like this. Let's say the heads of X, Y, and Z are 10, 5, and 20, so that y=5 is the min. Consider all tuples (x, 5, z). Since X is sorted we know that x >= 10 for all x, and likewise all z >= 20. So now we know any solution involving y=5 has the form (x >= 10, 5, z >= 20). Since y=5 is less than 10 and 20, we clearly know the optimal solution for that form has to be (10, 5, 20), which we just evaluated. Therefore, since we have proven that we've already considered the optimal solution for y=5, we can advance the Y pointer beyond y=5 to find more optimal solutions over (X, Y, Z). It's also pretty easy to convince ourselves that the algorithm terminates in linear time, since we always exclude one element from one of the lists. (To be precise, it's O(Nm) time, where m=3, but we can treat as m as a constant in the current formulation.)
Even though it's pretty easy to prove the correctness of the algorithm, there is still risk in messing up the implementation. Fortunately, this problem has a trivial O(N^3) solution as well, so you can use your slow implementation to validate the fast implementation for a bunch of randomly generated small lists. This won't conclusively prove that the faster algorithm works, but it can help identify bugs in the implementation.
This is a classic phone screen question. Leo's solution is the classic answer. If the first element isn't drawn on the first iteration, you want to keep it as a candidate, hence the swap. (And the swap also effectively removes the just-chosen number as a candidate, and adds it to the result set, effectively killing three birds with one stone.)
If you can process each number on the fly (by outputting them, using them directly, yielding them from a coroutine, appending to a passed in collection, or calling back a callback function), then replace the swap with a simple assignment of a[r] = a[i], since you won't need to revisit the value of a[i] after emitting it.
How about a sudoku random number? After all, repeats are disallowed in any square, any row, and any column. Problem solved!
- showell30@yahoo.com March 17, 2013Yep, I would only go the fancy solution if space was severely constrained. Overflow's a fiddly issue, but it's relatively straightforward to manage big integers with a simple multi-byte integer class, since sums and squares are pretty straightforward.
Why use a hash map instead of a bit vector or simple array of booleans? You know that the numbers range from 1 to N and are unique. Why pay the price of collisions in a hash table?
The problem with an XOR solution for the two-numbers-missing problem is that 1 xor 2 is the same as 5 xor 6 and 13 xor 14. It's not clear to me that you've actually resolved that conundrum. Maybe show some tests?
- showell30@yahoo.com March 17, 2013In a banking scenario, you often have these conditions:
1) Individual accounts have very little activity.
2) Accessing accounts has very high latency.
A small problem to avoid is the race where the same account is being incremented/decrement at the exact same millisecond. This can be handled with a simple mutex.
The bigger problem is that additions/subtractions to an account can't actually be committed until the remote account confirms the trade.
From a practical standpoint, the bank wants to err on the side of rejecting transactions prematurely. In other words, it wants to avoid ever promising funds from an account that has uncommitted credits, but it is also wants to avoid promising funds where provisional debits would lessen the balance upon commit.
So let's say Alice agrees to transfer $100 to Bob's account, and the transaction is initiated from a data center remote to both banks. The financial soundness of the transaction only hinges on Alice having sufficient funds in her account, so the remote data center notifies Alice's bank of the desired transfer. If Alice's provisional balance is too low, the transaction is immediately rejected. Otherwise, her available balance is immediately debited, and you make sure the transaction can be committed on Bob's end (i.e. his account is active, the network is available, etc.). At this point, the transaction is sound from a financial standpoint, but you can't commit it until Bob eventually gets the money in his bank. So, then you post the transfer to Bob's account, and if he confirms it, you then commit the transfer on Alice's end.
Remember that I said that individual accounts have very little activity, so for each account, you can inexpensively manage a data structure of all the uncommitted transfers. In some cases uncommitted transfers will be explicitly rolled back by the other party, but you also have to account for data failures, so you will have expiration logic. Here, you need to be very careful managing funds availability. Alice's bank can't release funds for her until it's certain that Bob hasn't been credited. On the other hand, Bob's bank can roll back his credit after a timeout.
Asking this problem in terms of Java is kind of contrived, because this is really more of a systems engineering question than a coding question. With a single data center, you would be more likely to using a locking model than a phased commit model, but bank transactions typically involve lots of latency between banks.
Python Solution with Cache
This Python code iterates through all the possibilities in descending order of length, and it avoids recomputing partial substrings by caching intermediate sums.
def test():
# test the string from the web page
s = '986561517416921217551395112859219257312'
assert 36 == get_equal_sum_substring(s)
# test with a longer string
s = '9' + ('5' * 500) + '9' + ('6' * 499) + '8'
assert 502 == get_equal_sum_substring(s)
def get_equal_sum_substring(s):
# Turn string of digits to an array of ints.
s = map(int, s)
# For a small s, the substring_cache is not
# really worth it, but for longer strings it
# makes sense.
substring_cache = {}
def sub_sum(low, high):
if low + 1 == high:
return s[low]
n = substring_cache.get((low, high), None)
if n is not None:
return n
n = s[low] + sub_sum(low+1, high)
substring_cache[(low, high)] = n
return n
# Set slen to be provisional length of half of
# the substring. We start with the highest possible
# value of slen so that we can return as soon as
# find and equal-sum substring.
slen = len(s) // 2
while slen > 0:
for i in range(len(s)):
if i+slen*2 > len(s):
break
lsum = sub_sum(i, i+slen)
rsum = sub_sum(i+slen, i+slen*2)
if lsum == rsum:
return 2 * slen
slen -= 1
return 0
test()
@tpcz, agreed, I'm in the same place on this.
The naive approach that I first considered is actually O(n^3), when you simply iterate all O(n^2) possible substrings and then compute the suitability of each solution in O(N) time. But you can then back to O(N^2) by precomputing a matrix of substring sums using DP.
For the iteration, my first thought is to try to consider the long lists first, since you can at least return early as soon as you find a substring that satisfies the condition. So, for a 6-digit input, consider the substrings in this order:
123456
1234
2345
3456
12
23
34
45
56
I don't see a clever way to prune the iteration. For example, among the foursomes, does something about one foursome's result allow you to eliminate other foursomes? I'm not seeing it yet.
Iterate through the digits from left to right. When you encounter the broken key digit, decrement it and then turn all the subsequent digits to 9.
But wait! The edge cases are 0 and 9. If your 0 key is broken, then you need to backtrack one digit before doing the decrementing and 9-replacement steps. If your broken key is 9, then you need to use 8 for the subsequent digits.
I would first code this to work for digits 1 through 8, and have a little test suite. Then I'd introduce the 0/9 edge cases, adding tests for those as well.
Also, Gayle, tell me truthfully that you wouldn't look kindly upon a security candidate who had obsessive backup plans for broken light bulbs in her house. Good security people and sys admins have an obsessiveness that borders upon an outright personality disorder.
- showell30@yahoo.com March 17, 2013Gayle, it obviously depends on the interviewer. I take it as a given that every interview question tests your problem solving skills to some extent, but within that context, some are more about your chops (do you know basic data structures?) and others are more subjective (what's your first instinct in a crisis?). I agree with you 100% that you need to get to the diagnosis aspect pretty quickly. My reaction to a burnt out light bulb would be two steps. First, I presumably need light. So, I find the flashlight, light a candle, turn on another light switch, use my phone as a make-shift flashlight, etc. Once I'm in the diagnosis phase, I start analyzing possible causes. The last time the lights went out in my apartment, it was the circuit breaker, and it was completely obvious, because the TV also shut off. But most of the time, it's the light bulb. If a light bulb goes out, sometimes you can clearly tell that the filament blew, but for some times of bulb, you go into the scientific method. I hypothesize that the light bulb's broken, and if I have a handy spare, I can validate the hypothesis by swapping in the spare. If the spare doesn't work, then I consider other theories. It could be that the original light bulb AND the spare are broken, but it's more likely that I'm dealing with a more systemic problem. Etc.
- showell30@yahoo.com March 16, 2013It's a strange question, but I assume it's a conversation starter. Possible reactions to a light bulb not working:
1) Lazy response: do nothing.
2) Paranoid response: Assume intruder is in your home, immediately call police.
3) Quick workaround: Immediately go to kitchen, grab flashlight out of drawer.
4) Quick diagnosis: Assume it's just the light bulb, immediately try replacing the light bulb.
5) Deep analysis: See if other lightbulbs are out. Consider hypotheses that you have experienced a local blackout (circuit breaker) or general blackout (neighbors light's are out).
6) Follow up: If it's the light bulb, consider cost/analysis benefits of buying a more hearty light bulb, making flashlight easier to find in a crisis.
This is just a personality question. If I were hiring a security person, I would want somebody who is really Type A, with a healthy sense of paranoia, but also some level of pragmatism. People who are into security need to account for extreme possibilities, so they're the type of people who have backups of every light bulb in their house, a flashlight in every drawer, and a schematic of their house's wiring conveniently laminated on the wall next to their circuit breaker.
I am downvoting this answer, because there are lots of ways to in-place sort a list, but this brief answer doesn't explain why heapsort would be superior to, say, quicksort, given the partial ordering.
Of the most common divide-and-conquer sorts, mergesort is the one that breaks down the problem by recursively creating two sorted lists, so it would seem like the most appropriate sort to adopt for this problem. Unfortunately, it's really hard to do a mergesort efficiently on array-based lists when you don't have contiguous storage at your disposal for the merged result. If the original list were a linked list instead of an array, then you could do it pretty trivial with no extra storage, but the problem explicitly says this an array.
@nishit, It's not that simple--you need to sort on the anagrams of the values, not the values themselves.
- showell30@yahoo.com March 16, 2013I didn't downvote it (in fact, I upvoted it back to zero), but I would offer one small critique. Why are you computing the anagrams in your compare method? This is gonna cause them to be computed multiple times in the outer sort, i.e. roughly logN times. Why not use the decorate-sort-undecorate idiom, otherwise know as the Schwartzian transform?
- showell30@yahoo.com March 16, 2013@xankar, agreed. My comment was mostly directed at the treemap solution, which is neither simple or clever. If you're gonna use a map, then you should take advantage of the main constraint--all integers from 1 to N--and implement your map as a simple array of booleans, or, better yet, a bitmap.
Other folks have since posted the clever solution. Sum the numbers, and sum the squares of the numbers, find the shortages in both, then use simple algebra to deduce the two missing values. It's kind of a parlor trick, but I would definitely be thrilled if an interview candidate knew it.
The problem is asking for the max product, not the max sum.
- showell30@yahoo.com March 16, 2013Nice solution. It's efficient, and your comments strike a nice balance of explaining things that aren't immediately obvious, without being overly verbose.
- showell30@yahoo.com March 16, 2013@xankar, To keep things simple, let's simplify the problem to say that the original seating arrangement is ABCDE, but our players are finicky about where they sit:
A: wants seat 3
B: wants seat 2
C: wants seat 4
D: wants seat 1
E: wants seat 3
Hopefully you can see how this maps to desired rotations from our original seating arrangement:
A: wants to move 2 clockwise
B: wants to move 0 clockwise
C: wants to move 1 clockwise
D: wants to move 2 clockwise (actual rotation)
E: wants to move 3 clockwise(actual rotation)
After we compute each person's desired shift, we update this data structure:
0 clockwise: 1 vote (from B)
1 clockwise: 1 vote (from C)
2 clockwise: 2 votes (from A, D)
3 clockwise: 1 vote (from E)
4 clockwise: 0 votes
Note that our data structure doesn't keep track of the voters; it's just the counts.
At the end we iterate through our vote tallies and find that 2-clockwise makes the most people happy (2).
Once you understand this scenario, it's a pretty small mental leap to account for the original seating arrangement being something other than ABCDE.
This problem is trivial when the list is sorted, so let's assume it's not. Even for an unsorted list, it's trivial to solve in linear time, so it's not worthwhile to sort the list first.
When only one number is missing, you can compute the sum of the list and compare it to the sum of the first N integers (N*(N+1)/2), and the difference is the missing number. If more than one number is missing, then the clever trick doesn't help a whole lot, since there are multiple pairs of numbers that could account for the shortage.
So, without clever tricks, let's eliminate fancy data structures too. Just allocate an array of N booleans, and set them all to false. In your first pass, go through the input array, take the number and set the array of booleans to false for that index. Then, in a second pass, iterate through your array of booleans to find which numbers were left out.
Overall running time is O(N), and because you're not using a fancy data structure like a hash or a tree, it's reliably linear, even for a worst case ordering.
@alex, It's a simple iteration with bookkeeping. Keep track of the max contiguous product seen so far, as well as active prefixes for a new contiguous product. Negative numbers add the twist that you want pre_min as well as pre_max, since a negative number can temporarily flip a sequence to be negative, but you don't want to abandon it, in case a subsequent negative number flips the product back to positive.
If you set aside negative numbers for a bit, then this is the core logic for each iteration:
new_max = pre_max*elem
pre_max = max([elem, new_max])
ans = max([ans, pre_max])
Every iteration updates the best in-progress answer as well as the best overall answer to date (which could be still in progress or just getting started or encountered earlier in the list, before a zero or small fraction).
- showell30@yahoo.com March 16, 2013The problem allows you to use more than two numbers in the product, as long as they're contiguous. Also, it's not explicitly stated, but I'm assuming that a range of one integer can be considered to have a "product" of itself.
Multiplication is commutative, so 5*7*2*3 is not ambiguous.
Also, multiplication has an identity element, so it's reasonable to consider product([42]) to be 42*1, although you'd clarify that with the interviewer.
Last but not least, you need clarify whether the product of an empty range is considered to be 1 or undefined. A mathematician would probably choose 1, but the engineering context might make undefined be more appropriate.
@algos, This is a good start, but if you're not told the insertion point in advance, then you'll need to do the bookkeeping on where the insertion happened. Also, while not explicitly stated in the problem, you can probably expect multiple inserts, so you're gonna want a data structure that scales beyond one insert.
- showell30@yahoo.com March 16, 2013This is the correct approach. You want to maintain a list of offsets. See "Python Solution with Paged Lists" for a full implementation of this concept. @Geet, If you're doing a large number of inserts, then you might not want a linked list to store the offset. Better to have an array that allows you to do a binary insert to find the pages.
If you're doing this in C, you need to be careful about memory management. You'll want your list of pages to also indicate whether pages were created via malloc() or carved out of a previously malloc'ed page. If you throw deletes into the equation, then you might want to consider a data structure that uses a fairly small page size. Having more pages will slow down random access, but that can be mitigated with a binary search for offsets, and your insertions/deletions will be faster.
Python Solution with Paged Lists:
def test():
arr = BigArray()
chunk_size = 10000
for i in range(100):
new_elems = range(i*chunk_size, (i+1)*chunk_size)
arr.insert_elems(i*chunk_size, new_elems)
assert 43999 == arr[43999]
assert 500000 == arr[500000]
arr.insert_elems(900000, range(3000))
assert 0 == arr[900000]
assert 1001 == arr[901001]
assert 900001 == arr[903001]
arr.insert_elems(903001, [1,2,3])
assert 1 == arr[903001]
assert 2 == arr[903002]
assert 3 == arr[903003]
assert 900001 == arr[903004]
class BigArray:
def __init__(self, page_size = 1000):
# Keep a list of pages, with an empty page
# at the end for convenience.
self.page_offsets = [0]
self.pages = [[]]
self.page_size = 1000
def debug(self):
for i in range(len(self.pages)):
print i, self.page_offsets[i], len(self.pages[i])
def __getitem__(self, i):
if i > self.page_offsets[-1]:
raise IndexError, "n too large"
page = self._get_page(i)
page_i = i - self.page_offsets[page]
return self.pages[page][page_i]
def insert_elems(self, n, lst):
# insert elements from lst into BigArray
# starting at index n
if n > self.page_offsets[-1]:
raise IndexError, "n too large"
if n == self.page_offsets[-1]:
return self.append_elem(lst)
# Find the page where insertion starts.
page = self._get_page(n)
cnt = len(lst)
# Split the page after the insertion point. This
# might create a small page, but we don't worry
# about it for simplicity sake. We can refine
# this later by trying to balance our new elements
# better.
self._split_page(page, n)
self._update_offsets_for_insertion(page, cnt)
low = 0
# See how much room we have in current page
# to avoid needlessly creating new pages. We
# may luck out with a partial page from a previous
# insert.
room = self.page_size - len(self.pages[page])
if room > 0:
# We might have extra room
if room > cnt:
room = cnt
# Extend our current page
self.pages[page].extend(lst[0:room])
low = room
# Now create new pages as needed.
if low >= cnt: return
new_pages = []
new_offsets = []
while low < cnt:
page_size = min([cnt - low, self.page_size])
new_pages.append(lst[low:low+page_size])
new_offsets.append(n+low)
low += page_size
self.pages[page+1:page+1] = new_pages
self.page_offsets[page+1:page+1] = new_offsets
def append_elem(self, lst):
# TODO: For now, we always create new pages, but
# we could try to look at the last nonempty
# page and see if there's room.
cnt = len(lst)
n = self.page_offsets[-1]
# remove empty page temporarily
self.page_offsets.pop()
self.pages.pop()
low = 0
while low < cnt:
page_size = min(self.page_size, cnt - low)
self.page_offsets.append(n + low)
self.pages.append(lst[low:low+page_size])
low += page_size
# put back empty page
self.page_offsets.append(n+cnt)
self.pages.append([])
def _update_offsets_for_insertion(self, page, cnt):
# update offsets of all the pages to the
# right of the insertion point (if any)
for i in range(page+1, len(self.pages)):
self.page_offsets[i] += cnt
def _get_page(self, n):
# This should be a binary search, but
# doing it as a linear search for clarity.
for page in range(len(self.page_offsets) - 1):
if n < self.page_offsets[page+1]:
return page
# the last page is unbounded, and it's usually
# an empty page
return len(self.page_offsets) - 1
def _split_page(self, page, n):
if n >= self.page_offsets[page+1]:
return
self.page_offsets.insert(page+1, n)
offset = n - self.page_offsets[page]
self.pages.insert(page+1, self.pages[page][offset:])
self.pages[page] = self.pages[page][:offset]
test()
The Banker's algorithm solves a much more complicated problem. Read the question again and then read the wikipedia article on the Banker's algorithm.
- showell30@yahoo.com March 16, 2013It only does three passes. One pass to initialize the array, one pass to increment it for each person, and one pass to find the max. It's O(N). The naive approach is basically N-squared, where you take try all N rotations and count all N people during each pass. Do you understand what I'm doing differently here?
- showell30@yahoo.com March 16, 2013Each person has a distance that they wish to be rotated. Calculate this distance as a positive number in the clockwise distance, so it will be a number from 0 to 4. Initialize an array from 0 to 4 with counts of zero, and the array will represent the number of people that are happy with each rotation distance. For each person, increment the appropriate slot in the array. At the end, iterate through the array to see which one has the largest count.
- showell30@yahoo.com March 16, 2013So the trick of the Bloom filter is that you only want each word to set a few bits in the filter. If each word leads to an average of ten bits being set in the filter, it doesn't take too many words until every bit in the shared bitmap is 1, at which time you effectively get positive guesses every time, which means way too many false positives.
- showell30@yahoo.com March 16, 2013See the solution from @Gohawks to see how to solve P2+P3=n on a sorted list in O(n) time. You start with P2 as the min and P3 as the max, and then each iteration you either increment P2 or decrement P3, depending on whether P2+P3 is too small or too big, respectively. The magic here is that once P2+P3 >= n, you can stop finding pairs for P3, since all the remaining P candidates will be greater than P2 by virtue of the list being sorted. This is what prevents the inner loop of this algorithm from being N-squared.
- showell30@yahoo.com March 16, 2013I'm assuming the requirements break down like this:
1) We're dealing with big data, not huge data, so millions of values, not billions.
2) We obviously need bulk inserts to be fast.
3) I'm gonna assume a read-mostly context, so we want random access to be at least log(N) fast, so a simple linked list won't cut it.
4) I'm gonna assume inserts are always bulk, so we can amortize the cost or live with the slowness for small inserts.
Given these constraints, I'd create a 2-level tree of integers, with roughly 1000 pages of 1000 integers each. The outer list (i.e. top of the tree) would have the offsets for the inner lists. For random access, you'd binary search the outer list to find your page table and then do an O(1) lookup in the page to get your value. This is essentially how BigTable works, except BigTable has a few more levels.
To do a bulk insert, you find the page where the insertion point belongs, split that page, then simply create a new page for your new data, and then you only need to update the outer table to accomodate the two new pages and recalculate offsets, which has a worst case of about 2000 operations.
If you're doing lots of bulk inserts over time, then you'll eventually want some compaction algorithm to consolidate pages. Also, after you split pages to make room for your new insertion, then if the first part of the split page is pretty small, you should consider simply appending your values to it, rather than creating a new page. Data structure that use a paging scheme usually have some flexibility on page size, so that you try to be a certain size, but you can go up to double that size to be lazy about re-organizing your data structure.
This same question was asked pretty recently. Here is my solution:
def max_product(a):
ans = pre_max = pre_min = a[0]
for elem in a[1:]:
new_min = pre_min*elem
new_max = pre_max*elem
pre_min = min([elem, new_max, new_min])
pre_max = max([elem, new_max, new_min])
ans = max([ans, pre_max])
return ans
assert 0 == max_product([0,0,-2,0,0,0])
assert 8 == max_product([-2,1,1,8])
assert 18 == max_product([-2,-3,1,3])
assert 12 == max_product([-3,-4,0,1,2,3])
assert 720 == max_product([1,-1,10,-8, -9, 0])
assert 2 == max_product([-50, 0.5, 2])
assert 2 == max_product([-50, 2, 0.5])
I don't know of any obvious way to do this faster than N squared time. To do it in N squared time, first create a reverse mapping of numbers to the index(es) of the array containing that number. Then do an N-squared pass on pairs of numbers, take their sum, then look in the reverse mapping to find the index of the additive inverse of the pair's sum.
You can prune the outer edges of the list by finding the most negative number and most positive number (in linear time), and then during your N-squared pass exclude numbers on the other side of the zero that have double the magnitude. For example, if your largest positive number is 6, then you can immediately reject -13 as being part of the solution, since you'd need at least one number > 6.5 to get back to zero.
@laxman601, I think you should try to collapse the following lines of code to one or two lines of code, and avoid all the special casing. If you can use zero-based indexing, then this idiom is fairly common:
desired_rotation = (desired_chair + num_chairs - chair) % num_chairs
If you are dealing with a language that computes modulus correctly for negative numbers (e.g. in Python, -5 % 7 == 2, which is correct, but JS is stupid about it and gives you the mathematically awkward answer of -5 ), then go with this:
desired_rotation = (desired_chair - chair) % num_chairs
- showell30@yahoo.com March 19, 2013