eugene.yarovoi
BAN USERThe recurrence relation here, where X is the total size of the input, is T(X) = 2*T(X/2) + c, which yields T(X) = O(X). The asymptotic runtime is therefore equal to that of checking each element one by one.
- eugene.yarovoi November 04, 2013There are no guarantees about which guard is guarding which door.
- eugene.yarovoi October 31, 2013It's not wrong -- it's a general and extensible solution.
The reason for the XOR-based solutions is that these solutions can have a much lower constant factor, and the problem is trivial if you use a hash table, so one might think that a solution not based on hashing was desired.
The time complexity of the above code is O(m + s), where m is the length of each string and s is the size of the alphabet. If s could potentially be large in comparison to m, we can use a hash-based set instead of an array to make the complexity O(m).
- eugene.yarovoi October 21, 2013(n+1)^2 = n^2 + (2n + 1), and 2n +1 is the n+1-th odd number, hence the relationship. The square root might not be an integer though, so we need a bit more here.
- eugene.yarovoi October 21, 2013There is not a solution in o(n) (that's little-O) because you have to look at all the characters of the string you're given.
As far as coding in O(n) (big-O), it's not that easy (though possible). I'd be interested in hearing how you solved it.
Though Miguel's right that you will never know for certain, a good way to get at least some more insight is to let us know the questions you were asked and what answers you gave, if possible. That way we could at least talk about whether your answers were as good as you think they were.
- eugene.yarovoi October 15, 2013I'm guessing the intended answer was c) 600, though the question is very bad because it requires multiple assumptions to be made.
Let's say we assume that having 24-bit instructions means that instructions are laid out contiguously in memory (there are no, say, 8-bit empty gaps). Let's further assume that the address is measured in 8-bit bytes, and hence instruction pointers will increase in multiples of 24 / 8 = 3. (This is not a good assumption because the machine could instead have a word size of 24 bits, and then instruction pointers would advance in increments of 1 and all of the choices would be valid).
Based on the two assumptions above, we get that locations of the form 300 (starting address) + 3 * n are definitely valid. 400, 500, and 700 are not of this form, and 600 would be the answer.
@Varun: building the graph would involve O(n^2) API calls. Hence the use of the method I gave, which uses only O(n) calls.
- eugene.yarovoi October 08, 2013To have a complete solution, you would need to mention an algorithm for finding a^(-1) without using division. After all, a^(-1) is sometimes *defined* as 1/a.
It looks like what you mean to say is that you will find a way to do something equivalent to division without dividing directly...which might be OK, but you'd need to give an algorithm for that.
@Miguel: check out my answer. I give an O(n) time, O(1) space method for finding two missing numbers without modifying the input array.
- eugene.yarovoi October 01, 2013Take the XOR of all the numbers -- this is the XOR of the two missing numbers. Since they differ in at least 1 bit, the XOR is nonzero in at least 1 bit. Identify one such bit position.
Now, make a second pass through the array, XORing only numbers with that bit position set to 0 and ignoring the others. You will get one of the missing numbers as a result. Make a third pass that will consider only numbers with that bit position set to 1 to find the second missing number.
It's not apparent how to generalize this idea to 3 or more missing numbers, because 3 missing numbers that are all distinct could have an XOR of 0.
@Adnan: Alva's code is correct. The manner in which you treat the count == 0 case is critically important. You cannot just set the count to 1 like that.
To understand why the algorithm is the way it is, consider that the algorithm is really an efficient implementation of the following idea:
S = the set of all delegates
while (there are delegates from at least 2 distinct parties remaining)
{
take 2 delegates from distinct parties, and remove them from S
}
return any member of S (they are all of the same party now)
It takes a little bit of thinking to see why the formulation I give above is equivalent to the algorithm discussed so far. Once you see the connection, you'll know exactly when you should add to or subtract from the count.
With this way of thinking, we can easily solve the more general problem of finding a delegate from each party that has strictly more than N/K delegates. The problem as originally posted is just the K = 2 version of this more general problem. It's just a matter of using K instead of 2 in the pseudocode I gave, and finding an efficient data structure to implement it.
That's not correct. There could be a URL that is not in the top 10 for any one server, but is in the top 10 overall. See tony's response to Miguel Oliveira's answer.
- eugene.yarovoi September 30, 2013@Itanium: No part of my comment says or implies that space is subordinate to time. I was responding to this part of your post:
"If you just say efficiency it refers to product of Space & Time efficiencies.
i.e. Efficiency = SxT"
You're presenting it as if it were a common definition. I was saying that, as far as I know, this definition is completely made up, and it doesn't exist as a recognized concept. Nor is defining it as "proportional to" space * time a recognized definition.
@Itanium: Does it? I have never heard anyone define "efficiency" as space complexity * time complexity.
- eugene.yarovoi September 26, 2013Your approach seems overcomplicated. Maybe we understand the problem differently?
Can you tell me what you think is wrong with my argument? I've given a precise argument why exactly one person will be eliminated with every question. To recap:
If A knows B, A cannot be the mayor, since the mayor knows no one.
If A doesn't know B, B cannot be the mayor, since everyone knows the mayor.
Therefore, we can always eliminate one person with every question.
We start with N people.
Therefore, once we ask N-1 questions, we'll be down to a single person.
If there exists a mayor, that person must by necessity be the mayor. If there is not a mayor, then we may have returned a bogus answer. Hence the validation step that I mention.
Take any 2 people in the set, ask if A knows B. If A knows B, A is not the mayor. If A does not know B, B is not the mayor. Either way, one person is always eliminated. After asking n - 1 such questions, only one person will remain. You can then validate that the remaining person is in fact the mayor by asking whether they know everyone else, and whether everyone knows them (2 * (n-1) questions). I haven't tried to optimize the constant factor, but it's O(n) questions.
- eugene.yarovoi September 24, 2013The goal is to give an algorithm that is guaranteed to eventually find the object. If you just keep going forward, you may never find the object.
Suppose that the object we're trying to find is a distance of N from the starting point, though we have no knowledge of the value of N or its distribution among the possible test cases. We know that, for obvious reasons, no strategy can take fewer than N steps (whatever N is) to reach the object. If we don't know where the object is, no strategy can even always do it in exactly N steps, because there's the possibility of starting out moving in the wrong direction. It makes sense to ask whether there's a strategy that always achieves no worse than O(N) steps. Such a strategy would imply that with absolutely no initial knowledge about the whereabouts of the object, you can always reach it in a number of steps that is only a constant factor worse than the number of steps needed to reach it if you know exactly where it is ahead of time (it takes exactly N steps to reach it if you know where it is). The answer by zodiak770 describes one such algorithm.
In the expectation, each list will have only O(1) entries. If a list only has O(1) entries, both insert and delete are O(1) for both arrays and linked lists.
I don't think anyone ever assumed resizing the underlying array was a big deal in terms of the efficiency of the resize operation itself. Keep in mind that even searching a bucket (the most common operation) is proportional to the number of elements in the bucket, so if touching every element in the bucket were a problem, hash tables wouldn't really work well.
@Anonymous: 3-SUM, not 3-SAT.
- eugene.yarovoi September 17, 2013@karaamaati: Thanks for adding code, but there's one critical mistake. It would be nice if you could edit your response to correct it. sizeof(char) is always 1, so you aren't really using bits here. You wanted the constant CHAR_BITS (use #include <limits.h>), which will (usually) be equal to 8. I also notice that in some places, you've directly used the constant 8, so you should convert those usages to CHAR_BITS as well for consistency.
There are some smaller stylistic issues too, for example "return condition ? true: false;" can be simplified to "return condition;". "complicatedThingThatTakesLotsOfTyping = complicatedThingThatTakesLotsOfTyping & expression" can be simplified to "complicatedThingThatTakesLotsOfTyping &= expression".
For the case where we don't know in advance that most elements will be T, a good approach would be to keep a growable bitset. This is a growable array of integers, used as a bitset for space efficiency. You could write a simple ArrayList implementation that initially allocates an array of integers of some small size, and every time the array runs out of space, doubles the storage and copies the existing data over to the new storage. The i-th Boolean in the collection would be stored in the (i%32)-th bit of the (i/32)-th integer, if we assume that integers are 32-bit. This data structure has O(1) amortized add, O(1) set, and O(1) get. The space usage is O(total number of values in the collection) + O(1).
If we know that very few values are F and we're interested in saving space, we can keep a number to tell us how many total T and F values we have, and then keep a hash set of all the F values. To determine the value at an index, just return !hashSet.contains(index). To set an index, remove it from the hashset if setting to true; add it to the hash set if setting to false. This data structure has O(1) expected time add, set, and get. The space usage is O(number of F values) + O(1).
In both of the above implementations, the length() operation would return, in O(1) time, a previously-stored count (that is updated in O(1) time as elements are appended).
It doesn't work this way. p[2] is syntactic sugar for *(p+2).
Also, if p refers to address 1000, p + 2 is not 1002. p + 2 = 1000 + 2*sizeof(*p) = 1000 + 2 * sizeof (int) = 1008 for 4-byte ints.
Reading the system clock is usually going to be considerably more expensive than doing arithmetic or converting integers to strings. Besides, if you have to wait for the clock to change, you are probably constantly polling the clock, which means you're probably doing a lot more than just 100 reads. Comparisons to determine whether the time has changed since the last time you polled the clock are also not free.
It's not true that you won't create any variables, because you will need one to hold the time returned from the system clock, and you will also need a variable to hold the previous time returned by the system clock so that you can see whether it has changed. You'll also need one to hold the clock start time and / or the end time. You can see this program will have at least a few variables.
We might ask what base results in the best constant for that O(n) runtime. Using a base of sqrt(2) + 1 results in the best constant for the worst-case runtime out of all the exponential-based solutions.
Consider that the worst case is when our target value is some number N, but in the previous pass, we only reach N-1 before turning back. So, in that previous pass, we moved 4*(N-1) units (forward and back in the positive and negative direction). Furthermore, if our base is b, the passes before that one, we must have taken 4*(N-1) / b + 4*(N-1) / b^2 + ... 4* 1 steps. The total of this and 4*(N-1) is approximately 4 *(N-1) / (1 - 1/b) by the formula for the sum of a geometric series.
Now consider the pass on which we find the target. First we might go in the other direction and back to 0, taking 2*(N-1)*b steps. Then we finally reach the target in another N steps. So we have that in the worst case, reaching the target takes 4*(N-1) / (1 - 1/b)+ 2*(N-1)*b + N steps. We can ask for what value of b this expression is smallest. Fix N while treating r = 1/b as a variable. Taking the derivative with respect to b and setting to 0 gives -1/r^2 + 2 / (r-1)^2 = 0. Solving this, we get r^2 + 2r - 1 = 0. This has a positive solution for r = sqrt(2) - 1, that is, b = 1 / (sqrt(2) - 1) = sqrt(2) + 1 =~ 2.414.
The idea of some compression schemes is that you can save space on frequently-occurring bit sequences by encoding them with shorter bit sequences. You would then save on the more frequently-occurring bit sequences at the expense of the less frequently-occurring ones.
That's not really what we're talking about, though. To represent just one object (just one selection of two positions on an 8x8 chessboard), if you want to be able to represent all 2016 possibilities, you cannot do it in less than ceil (log_2(2016)) = 11 bits. There is no way around this.
For that matter, even if we talk about space usage when you need to store N objects, you can't be guaranteed to be able to represent N such objects in less than (roughly) 11N bits. Compression schemes are only effective when some objects occur more frequently than others. Unless you know for a fact that your distribution of objects is non-random, you can't be guaranteed any space savings.
If you consider 10 and 00010 as two different numbers, you are not working in base 2. You've implicitly introduced a third type of symbol: the space. The designation "bit" applies to a 0 or 1 symbol in base 2.
- eugene.yarovoi September 01, 2013That they interviewed you doesn't mean your lack of experience didn't count against you. It only means that (at the time they scheduled your interview) they thought you might stand a chance without the experience. So it might mean:
1. The team wasn't sure how good of a candidate they'd find, so they were willing to consider inexperienced candidates, but then they found someone who had both experience and good interview performance.
2. You did well, but not well enough to compensate for not having the experience.
3. You didn't do well.
This won't work because there can be multiple dictionary words that start at the same index. Consider "flipper" and a dictionary of {"flipper", "flip"}.
- eugene.yarovoi August 30, 2013You're treating space differently from 0, which means your number system is not strictly base 2, and your solution cannot qualify as using only 10 bits.
@Brennahan: it is not possible to mash 2016 combinations into a bit representation that holds 1024 combinations at max. Consider any function that maps a 10-bit value to a set of 2 positions. Clearly, since there are only 1024 possible inputs, there are only 1024 possible outputs. Some of the 2016 sets of 2 positions therefore have no 10-bit value that can represent them.
Let's consider how many ways there are to place two pieces on an 8x8 chessboard. The first piece can be placed in one of 64 locations. The second piece can be placed in any one of the 63 remaining locations. Then, assuming you don't care about the order of locations, every pair of locations (i, j) is equivalent to the pair (j, i), so to get the total number of different pairs of locations, we need to divide by 2.
We get 64 * 63 / 2 = 2016.
There is therefore no way to do this with 10 bits, since a 10-bit value can only have one of 2^10 = 1024 possible values. With only 10 bits, only 1024 of the 2016 possibilities could have a representation. All the claims in this thread that you can do it with 10 bits are groundless.
With 11 bits, you can represent 2048 values, which is sufficient. The algorithm for encoding two locations as an 11-bit value is relatively straightforward. Give each chess square a number from 1 to 64. Then represent a pair of locations as the smaller number followed by the larger number. Map an integer to each pair in order, like this:
(1, 2) -> 0
(1, 3) -> 1
...
(1, 64) -> 62
(2, 3) -> 63
(2, 4) -> 64
...
(2, 64) -> 124
(3, 4) -> 125
...
(63, 64) -> 2015
You can do some simple math to compute the mapping efficiently.
The locations then have a 1-to-1 correspondence with integers between 0 and 2015. 11 bits are necessary and sufficient to represent these.
The new revision isn't correct either. There's no reason why parsing from either the front or the back greedily would be correct. There are some cases for which it happens to be right, but add "pper" to your array variable and you'll see wrong output.
- eugene.yarovoi August 28, 2013No need to use binary search. The middle of the array is given by arr[arr.length / 2].
- eugene.yarovoi August 22, 2013This problem is called "set cover", and is a classic NP-complete problem. There are approximation algorithms and heuristics that you could employ. If you only need to solve this for very small inputs, you could just do it by trying every combination (try all combinations of size k before trying any of size x, if k < x).
- eugene.yarovoi August 22, 2013I think this was supposed to be a special case of the Josephus problem. The question could have been worded a bit more clearly.
- eugene.yarovoi August 22, 2013That seems intuitive, but isn't actually correct. Consider the following example:
Dictionary words: flip, flipper, dolphin
String: flipperdolphin
With your approach, you would see "flip" and consider it a word. You would then be left with perdolphin, which cannot be split into dictionary words. However, if you had taken the split "flipper dolphin", you would have found a solution.
Our preference for splitting words as early as possible does not imply that the earliest split is always correct, because the earliest possible split does not always lead to a valid solution. We therefore cannot take the direct greedy approach you're taking.
How much experience do you have? A 1-page resume is generally appropriate for out-of-college hires or candidates with only a small amount of industry experience. More experienced candidates will sometimes have longer resumes.
- eugene.yarovoi August 15, 2013This is incorrect. Suppose M = 10, N=1, and i=1. The formula does not yield the expected value of 1. The flaw in the analysis above is that (2) does not hold because the events are not independent.
- eugene.yarovoi August 15, 2013The problem is not stated very clearly. I will try to reword it:
You're given an array of strings, for example {fog, fool, food, fruit, for}. You pick a string S, and for each string x in the array, if S is a prefix of x, you remove S from the beginning of x. In other words, if x could be written as S.concat(y), where y is some (possibly empty) string, you replace x with y. The goal is to find the choice of S that minimizes the sum of all string lengths in the array after this replacement + length(S).
This can be solved in optimal O(M) time, where M is the total number of characters present in the input, by using a prefix tree. Build a prefix tree that is augmented at each node with a counter of how many strings in the prefix tree begin with the prefix defined by the path to that node. For example, if the prefix tree contains the words "cow" and "cool", it would look like
root (2) -> c(2) -> o(2) -> o(1) -> l (1)
|
-> w(1)
I hope the above formats correctly when people look at it. The top c and o nodes have a count of 2 because there are 2 words that begin with "c" and "co" respectively. The next "o" node has a count of 1 because only 1 word begins with "coo".
Now, you want to choose S to be the string defined by some path from the root to a node in this prefix tree. If you don't, your choice cannot be optimal, since there are characters at the end of your chosen string that could be removed to improve the cost. The question now becomes which path we should choose.
We can do a traversal of the prefix tree. Each node corresponds to a possible choice of S. At each node, we have the count of how many strings in the array have that choice as a prefix. All we need to do is multiply the count by the depth of the node (length of the corresponding choice for S), which we maintain during the traversal, to determine how many total characters will be removed by that choice of S. We then subtract the depth of the node once to compensate for the length(S) term in the cost function. The result is the cost reduction that would be achieved by this choice of S. Minimizing the cost is equivalent to maximizing the cost reduction. We keep a reference to the node seen so far that has the largest cost reduction as we traverse. After the traversal, we do a depth-first search to find the path from the root to that node, and reconstruct the optimal choice of S from the path.
A prefix tree can be constructed in O(M) time, traversed in O(M) time, and DFS searched in O(M) time, where M is the total number of characters in the input. Adding an additional count to each node and maintaining the depth and the maximum so far during the traversal does not change the asymptotic bounds, so the total time is O(M). The total space will also be O(M), assuming storing a count takes O(1) space per node.
Binary trees or binary search trees?
- eugene.yarovoi August 06, 2013The O(m + n + R) runtime is for the approach of using a bitset of size R, not a hash table. The R comes into the equation because you have to initialize the bitset.
The worst-case runtime of the hash table approach is actually O(m*n) in the case that all the entries end up in the same bucket.
If the question instead was to solve the problem for two sorted arrays, there is a simple comparison-based O(m+n) time algorithm whose logic is very similar to the "merge" step of mergesort. Step through the two arrays in the same way that the merge algorithm does, and when the next element from array1 is equal to the next element from array2, add the common number to a list of results.
For two unsorted arrays, you can solve using a hash set, but this is O(m+n) only in the expectation and not in the worst case. You can solve the problem in worst-case O(m+n+R) time where R is the range of the numbers, if you also have O(R) space, by using a bitset. Finally, if you are programming in a language that does not initialize memory, you can solve the problem with O(R) space and O(m+n) (no R) time by combining the bitset approach with uninitialized memory tricks.
I agree with the T(n*m) = 3 * T(m*n/ 4) + c. Sorry I misquoted you. I had believed you to be claiming T(n) = 3*T(n/4) + C.
My central point that your analysis is incorrect, however, will have to remain. What do you believe your recurrence relation evaluates to? Do you believe it's O(log (MN))? It's actually O((MN)^log_4(3)). When you consider the M = N case, you get the bound of O(N^log_2(3)) that I mentioned earlier. For the N x N case, this runtime is asymptotically worse than the runtime of the O(M + N) algorithm.
Please, when you claim something that is not at all obvious, justify your answer. In trying to come up with a justification, you might discover that what you're claiming is not correct.
- eugene.yarovoi July 26, 2013The question is not ill-posed. Think of it like this: suppose you don't know the maximum floor number or anything about the probability distribution, but you want to give the best bound on the worst-case number of drops, expressed in terms of N where N is the lowest floor on which an egg breaks.
For example, using only the first egg and searching floors linearly is O(N). In that strategy, we would drop the egg from the 1st floor, then 2nd, then 3rd, and so on, and we stop when we reach the N-th floor and the egg breaks. We now know the answer is N, and it took us N drops to find out. We didn't need to have any upper bound on the answer to execute this strategy.
The strategy where we binary search with the first egg and then linearly search with the second afterwards is also O(N) in the worst case (the linear search could be expensive). The question would be asking about whether we can give better bounds, even if we don't know how large N will end up being.
@masterjaso: The binary search strategy works if you have about 2*log N eggs, where N is the floor to be found. Otherwise, it is not optimal. That's the point I was trying to get across.
If you only have 2 eggs, binary searching with the first egg gives you an algorithm that is O(N) on average, since the linear search with the second egg will consume a lot of time. In the average case, the binary search is not asymptotically better than using only one egg and doing a linear search from the start -- it is only a constant factor better.
Note that the arrays are already sorted in this problem.
Binary searching will still take O(log n) per element, and so you will get O(n log n) complexity with your approach as you mentioned.
However, since the arrays are passed to you sorted, you could have simply advanced two pointers, one through each of the two arrays, to finish with the optimal O(n) complexity. It would be a similar technique to how the "merge" step of mergesort steps through two sorted arrays.
Usually hashing implementations do something smarter than just taking value % numBuckets, and so you usually don't have to be very fearful that all your numbers will wind up in the same bucket. In practice, hashing tends to be a good solution.
Your point that an O(n^2) worst case is possible is correct, though. Some people do use sorting for solving this problem for that reason (and also because sorting has low constant factors, is simple, and can have good cache performance).
RepGayle L McDowell, CEO at CareerCup
Gayle Laakmann McDowell is the founder / CEO of CareerCup, which provides programming interview prep for candidates interviewing with Microsoft, Google ...
RepLoler, Employee at Google
Rep
This only works if the same city isn't visited more than once during the itinerary. To solve without this assumption, you need to find an Euler tour of the graph representing the travel (there may be multiple possible itineraries if a city can be visited more than once, though, as EulGam05 mentioned).
- eugene.yarovoi November 05, 2013