eugene.yarovoi
BAN USER
This is actually kind of an interesting problem. You need to make several levels of arrays, each one having one element that is the minimum of two elements in a lower level.
If you have 1 2 3 4 5 6 7 8
Your first summary level will be 1 3 5 7
The next will be 1 5
and the final just 1
I'll leave it to you to determine how to use this to find the minimum of any range in logN time. Hint: a range like 3-7 can be broken into 4-7 (which is one lookup in the second summary level), and 3. You can break any range into just logN pieces that can be looked up in the summary arrays.
This is a very complex question that needs a detailed analysis of the requirements and typical use cases. Only then can we properly weigh the different possible solutions. Below are just some simple thoughts off the top of my head:
A simple idea is to keep 2 data structures. First I'd have a doubly-linked list-based queue sorted by date for each page access. Each node would have the timestamp, URL, and a pointer to the node in the queue representing the next older access of that same URL. The queue would be capped to both a certain size and certain time range and then I could delete entries from the back of the queue as needed.
Second, I would have a TreeSet sorted alphabetically by URL domain name first and then by page URL within that domain name, mapping URLs to pointers to the node in the doubly-linked list that represents the most recent access to that page.
That way, if someone wanted to look at when all their accesses to a specific page occurred, it could be done very quickly. They'd be presented with URLs in alphabetical order and they'd select the right one, a lookup would be done and they'd get the record of the most recent access, and the record of the most recent access has a pointer to the next access, etc. Then if they wanted to see their *entire* history of visited sites, they could view the entire queue and it would be showing them a log of things they've visited in chronological order.
The use of additional data structures is possibly desirable. For instance, maybe we'd store a pointer somewhere to a node that's a day old, a node that's 2 days old, etc., so certain timeslices of the history could be presented with the least amount of node traversals. Of course these pointers would have to be updated too, but background work is much better than having time during which the user is actively waiting for something to happen.
That's just if it's an in-memory structure. Another thought is that you need to persist this data structure. So maybe I'd rather use something that's more like a database (but local, without the whole remote server aspect that technologies like SQL Server have). If you used an indexed database table, you'd want at least an index on the date and an index on the URL.
Or maybe you want to know *all* the times when a page was visited, not just the most recent. The requirements need to be clarified.
- eugene.yarovoi September 03, 2012Having it executed from finally is not really an option, if you consider that some exception HAS TO arise in order for the finally block to be reached at all (well, you could maybe find some way to execute all that from within a catch block and suppress the exception, but I'm not sure that's what you want).
However, you can try this:
public void checkExit(int status) {
super.checkExit(status);
//hehehe
if (status == 0)
{
System.out.println ("Good Bye");
}
}
No one's denying that it's possible to use insertion sort. But it's not the most efficient algorithm here.
- eugene.yarovoi September 03, 2012Mentioning at least a little bit about what strategy they use, instead of just inviting readers to look through a huge open-source project, would be a good idea.
- eugene.yarovoi September 03, 2012Consider the following extremely simple greedy algorithm. Sort the pairs by the second element in the pair. You will always include the first pair in this sorted order in your maximum chain. Then, for each subsequent pair in this sorted order, if the pair can follow the last pair that was included in the maximum chain, take it. Otherwise, do nothing for that pair and just go to checking the next pair. That's it.
This has O(N) performance once the input has been sorted.
Why is this greedy algorithm optimal? I'll try to give a proof that's fairly rigorous.
Define F(z) as the size of the longest possible chain we can get if we only look at pairs where x > z. Clearly, F(z) is non-increasing, since any longest chain considered to calculate F(b) can also be considered for F(a) if a < b. So, if a < b, F(a) >= F(b). The answer to the problem is F (-infinity).
Consider evaluating F(z) for some z by finding the maximum chain out of pairs where x > z. Consider the first pair in the maximum chain. Whichever pair that is, we can never take any pairs that have lower y into the chain, since that would mean the current pair is not the first in the chain. So once we choose a pair to be the first pair, the size of the longest chain we can build is 1 + F(the first pair's y). (That's one to count the first pair itself, and then the longest chain we can build from pairs that can follow the first one in a chain: pairs where x is greater than the first pair's y.) So, we know that to maximize the size of the chain, we must maximize F(the first pair's y). As we showed before, maximizing F(the first pair's y) corresponds to minimizing the first pair's y. That means we should always choose the element with the smallest y that satisfies x > z.
We now have an algorithm for evaluating F(z) for any z. Consider only pairs where x > z, and choose the one with the smallest y. If the input is sorted in order of y, that just means choose the first one that has x > z. Then, the answer is 1 + F(that pair's y).
Call the i-th element in the maximum chain C[i]. For the overall maximum chain algorithm, the solution is F(-inf). Since each pair's x > -inf, the first pair in the chain, C[0], should just be the one with smallest y out of all the pairs (the first one in the sorted input). The maximum chain is C[0] + the maximum chain corresponding to F (C[0].y). So any subsequent pair included in the chain, C[i], should be the first pair in the sorted input that has x > C[i-1].y. Since x > C[i-1].y implies y > C[i-1].y, such a sorted pair can only occur later in the sorted input than C[i-1]. So we must scan the sorted input starting where we found C[i-1] and take the first element that satisfies x > C[i-1].y as C[i]. Then we go to scanning from that point for C[i+1], and so on. It's clear that we need to make only one traversal through the sorted array, since scanning for C[i+1] picks up where scanning for C[i] left off. By the time we're done traversing, we have the maximum chain of pairs.
And how will you take care of those invalid cases without a stack or something conceptually equivalent?
- eugene.yarovoi September 03, 2012I see what you're saying. We must have a different picture of what exactly constitutes a bubble sort. The way I learned to do it, way back in the day, has the best, worst, and average case at O(n^2). Insertion sort, on the other hand, can be as bad as O(n^2) or as good as O(n).
- eugene.yarovoi September 03, 2012Wouldn't the graph as you're constructing it have E bounded only by O(V^2)?
- eugene.yarovoi September 02, 2012Sorry, that was a typo. I have corrected it above.
- eugene.yarovoi September 02, 2012I've never actually had using NULL in contexts like that not work for me, but NULL is supposed to be for pointers only. You're supposed to do
while( str[retval++] != '\0') or just while( str[retval++])
Also, you're counting the null terminator in your character count.
So where do you get these numbers?
- eugene.yarovoi September 02, 2012The good news about using mergesort for linked lists is that it doesn't really require extra space like it does for arrays.
- eugene.yarovoi September 02, 2012You're saying your equality criterion will be the URLs of two PageVisitedRecord objects matching, but yet your tree ordering criterion will be the date. So two PageVisitedRecord objects could be equal (same URLs) and yet still be supposed to be inserted in different places in the tree. This won't work for a variety of reasons.
First, when you update the set with an equal PageVisitedRecord to something you already have in the tree (equal by your equality criterion -- the URLs match), how will the set search for this equality? The tree is sorted by date.
Second, even if this update did somehow work, the set might do something stupid. It might think that since you're not inserting a new element, it doesn't need to move the existing element. That might be bad because your tree is sorted by date and your new element has a new date that might need to be in a different place in the tree. Or the set might think that it can just ignore your update request, since inserting an element into a set that is already there should have no effect, in theory. Then the date might not get updated.
"There is not space on this page to discuss the problem for every given permutation of assumptions."
That, good sir/madam, is very well put.
Two things. One is that you should compare how long it takes to run this program on, say, 100000 random inputs (with the functions getting the same random inputs; store them).
Second is that n = 20 might be a little small for the fast exponentiation approach to really have a dramatic benefit. Nonetheless, I think it will be faster when tested properly.
"Please let me know if this is right choice"
Only you can decide that. Do you like web programming?
If it interests you, there's definitely plenty of demand for web programmers out there. If you're good, and especially if you have a portfolio (or can make one), you should be able to find a good job.
I'm afraid you'll generally have to not write in Lisp during interviews. Learn Java, C++, or C. Furthermore, many interviews require familiarity with object-oriented programming, so learning Java and/or C++ will help you there too.
- eugene.yarovoi September 01, 2012@LoocalVinci: That's not correct. The correct recurrence for this solution is F(N) = 3*F(N/2) + C, N = n^3. This evaluates, like I said above, to approximately O(n^4.75).
If you modified this solution to get a better recurrence, you could possibly get something like F(N) = 7*F(N/8) + C. But this is very different from F(N) = F(7N/8) + C! The former has a performance of N ^ (log base 8 of 7) power. Since N = n^3, this evaluates to about O(n^2.81), a little better than the brute-force n^3 search and considerably worse than the n^2 algo discussed in the top-voted answer.
Hmmm, you appear to be right. I'm really surprised that Java would use something like a LCG...I thought they used Mersenne Twister.
- eugene.yarovoi August 31, 2012I'm a bit horrified to see the addresses returned by malloc used for random number generation.
- eugene.yarovoi August 31, 2012Nope. Here we have additional constraints, like the number of students per school.
- eugene.yarovoi August 31, 2012I think you misunderstand the question. A[0][0][N-1] is not guaranteed to be smaller than A[0][1][0]. The only sortedness property here is that if you have two positions whose coordinates in only one dimension, the position with the greater value of the differing dimension holds a value >= to the position with the lesser value of the differing dimension.
- eugene.yarovoi August 31, 2012I think you misunderstand the question. A[0][0][N-1] is not guaranteed to be smaller than A[0][1][0]. The only sortedness property here is that if you have two positions whose coordinates differ in only one dimension, the position with the greater value of the differing dimension holds a value >= to the position with the lesser value of the differing dimension.
- eugene.yarovoi August 31, 2012The good news is that it's really easy to see the correctness of this approach. The bad news is that the worst-case complexity here is at least quadratic.
- eugene.yarovoi August 31, 2012I don't know what to tell you. In situations like this, my advice would be to ask the interviewer what requirements they have that aren't being met by the current solution. What's the business problem, and in what way is it not being solved by the current solution?
- eugene.yarovoi August 30, 2012I said that generally primitives have to be aligned to start on an address that is a multiple of the lesser of the size of the primitive and the machine alignment size. I'd like to explain why this is so.
Generally, machines read data only in chunks that are the machine alignment size and only at addresses that are multiples of the machine alignment size. When your alignment size is 4, any read of a lower power of 2, like a 2- byte or 1-byte datatype, results in the full 4-byte read that is then appropriately AND-masked or shifted to get just the part you want. When the address of a primitive is a multiple of the size of the datatype, any such read that involves shifting is guaranteed to need only the data from one 4-byte chunk. Let's say the readable 4-byte chunks are bytes 0-3 and bytes 4-7. If you positioned your 2-byte short int starting at byte 3 and extending into byte 4, the architecture would have to read both 4-byte locations and combine them with a lot of masking and shifting to get your short. But if you position your short at a multiple of its size, 2 (so address 0, 2, 4, or 6 here), the read can always be completed with just one read and a simple mask operation.
The architecture could allow you to do inefficient reads that require reading 2 locations and provide the wiring for that, but that would require extra logic and circuitry for something that's inefficient anyway. So most architectures require alignment and throw bus errors when you don't meet the alignment criteria.
When you have types > 4 bytes on 4-byte alignment machines, multiple locations have to be read anyway, and the large datatypes are required to be aligned to a multiple of the machine alignment size to ensure that, for instance, reading a double requires at most 2 reads and simple shifting instead of 3 reads and complex shifting (how many 4-byte blocks must be read to read a double that starts at byte 3? 0-3, 4-7, and 8-11.) Some architectures may have an 8-byte mechanism for reading these types and require alignment to 8-byte multiples for these types.
The best strategy for alignment? Align all types to addresses that are multiples of the lesser of their size or the maximum read size on a machine, which for current architectures is usually 8 bytes.
It should be 4 bytes, unless your system makes it a rule to align all structs to 8 bytes. char datatypes should generally be readable on any byte. chars can be read on any byte because data generally has to be aligned to the lesser of the size of the datatype in question and the word alignment size of the machine.
- eugene.yarovoi August 29, 2012There are also faster algorithms. There's a simple to understand one that runs in n ^ (log base 2 of 3) when n is the size of m, and then the fast Fourier transform, which runs in n*log(n).
- eugene.yarovoi August 28, 2012Actually, you need at least O(m*n*log(n)) time.
- eugene.yarovoi August 28, 2012Since the if condition will almost always be false, the branch predictor should be able to pick up on that, guess false all the time, and make this common situation cheap. The branch will only be mispredicted when the condition is true, which will likely be once in a rare while.
Furthermore, it's likely that the compiler will eliminate the if statement altogether, replacing it with a special conditional set instruction available on many modern architectures, an instruction that either does nothing or sets a register to a value, depending on a condition flag. If this happens, there will be no need to have the possibility of an instruction jump. The code could look like (in pseudo-assembly):
[temp, size, tail are placeholders for the registers that hold these values, let's say]
add tmp, tail, 1 // tmp = tail + 1
cmp tmp, size //sets the "Equal" condition flag to 1 iff tail+1 == size
csete tmp, 0 // Conditional SET if Equal. If "Equal" condition flag is 1, set register tmp to 0.
In hardware, these sorts of conditional instructions work simply by having some AND and OR logic gates to apply a bitmask based off the condition flag.
In fact, the operation could even have been written in software (not that I recommend such code):
tmp = tail + 1;
//right arithmetic shift. If temp = size, it evaluates to a mask of all 0s; if temp < size, all 1s. INTEGER_BIT_WIDTH is 32 for 32-bit integers, 64 for 64-bit integers, etc.
conditionMask = (tmp - size) >> (INTEGER_BIT_WIDTH - 1);
// apply the mask to tmp, either clearing tmp or not changing it
tmp &= conditionMask;
Won't work. Something could be repeated 3, 5, 7, etc. times and influence the XOR.
- eugene.yarovoi August 28, 2012On the answer. Guess an answer, square it, and see if the square is too big or too small. This approach can be used for any monotonely increasing function.
- eugene.yarovoi August 28, 2012Just because a treap is a tree with some random structure doesn't mean that it's the solution to this question involving trees and probabilities. The solution here is deterministic because we're asked to minimize an expectation.
The DP approach is the right idea.
Sounds like NlogN time, not logN. Huge difference.
- eugene.yarovoi August 17, 2012I'd like to see some explanation of how an O(n) algorthm for this problem follows from an O(n) solution for longest palindrome. I don't think it does, given that there can be multiple palindromes having different centers.
- eugene.yarovoi August 16, 2012If that's what the hash is being used for then it should probably be an array of HashSet<Integer>, where array[i] is a set of the stations that station i is connected to.
- eugene.yarovoi August 15, 2012What type is ptr in this test that you did? The allocation shouldn't be size+1 bytes, but size+sizeof(void*) or something like that (compiler-dependent, though). Did you cast the value returned by malloc to int* first? In that case, ptr[-1] actually read returned_ptr - sizeof(int), which is what I would expect.
- eugene.yarovoi August 15, 2012@anshulzunke: Why do you choose a hash table over a 2D array?
@Barney: You don't have to go through all the rows in a database to find the match. You can use indexes.
@ashutosh: So? How efficient do you think the top solution is? Like most other solutions on this page, it's O(m*n).
- eugene.yarovoi August 15, 2012What if all three of a, b, c have the same least significant bit? Then you also don't have a bit to partition on.
- eugene.yarovoi August 13, 2012I think dynamic programming is a little excessive here. Dynamic programming is most appropriate when you have subproblems whose solutions are re-used multiple times. Here there's no such thing. KSum(i,j) is only used in evaluating KSum(i+1, j).
You're better off just evaluating your recurrence iteratively. Of course you never say in your video how you intended to evaluate the recursion, so maybe your plan was to decompose it into iteration (and also not memorize solutions to subproblems, since that would also be a waste).
Yes, this is another way of solving it. You should just say it's O(MN) -- that's still correct and is simpler.
- eugene.yarovoi August 13, 2012I don't think there's a way to easily recover from the issue I pointed out earlier: that all bets are off when you're XORing 3 or more numbers. Therefore, downvoting because this approach seems fundamentally broken.
- eugene.yarovoi August 12, 2012That's an interesting technique. I actually used something similar to it in one of my projects about a year back, but not exactly like that. My technique wouldn't have been powerful enough for this, but this technique actually requires you to store an integer for every possible value, so now you need something like 16GB (4 billion possible values * 4 bytes/int) + 4*n bytes (dense set representation) + 1 GB (size of the bitset from our discussion before) > 17 GB extra space!
Thanks for teaching me about this technique, though. It's still a useful trick to keep in mind and consider using at times.
And then the language is not irrelevant. You need to use a language that doesn't clear arrays to 0 before handing them to you.
If you read the Wikipedia article, there's an algorithm that doesn't require hashing that also achieves O(n^2), except with a lower constant factor and deterministically.
- eugene.yarovoi August 12, 2012"Two bits because "Every number appears twice, except three of them" does not imply those three occur exactly once."
Very true. So the two bits are being used to distinguish between zero, one, two, and more than two occurrences.
"The calculation might be off, but you can have arrays (with default value 0) without actually initializing it. "
No, you can't. Either your language zeros out arrays or it doesn't. If it does, it costs you no matter what. If it doesn't, you must do it. I'm also concerned about the unreasonable memory footprint in addition to performance.
@Sunil: No. You misunderstood the problem. There are three different numbers with one (or 1 or >2, depending on your interpretation) occurrences.
You need to know whether the number occurred not at all, once, or twice. That's 3 logical possibilities, so we need 2 bits per number.
- eugene.yarovoi August 12, 2012
RepGayle L McDowell, CEO at CareerCup
Gayle Laakmann McDowell is the founder / CEO of CareerCup, which provides programming interview prep for candidates interviewing with Microsoft, Google ...
RepLoler, Employee at Google
Rep
Heapsort is difficult to do on something that doesn't have random access like a linked list. And even if you could do it, why would it be the best?
- eugene.yarovoi September 03, 2012