wgestrich
BAN USER@jakubadamek: So each machine is using the above code to return the median of its data. But when a machine obtains the median of its set, how are the M results used to get the global median? Are the recursive calls being delegated from a leader to other machines?
- wgestrich August 10, 2013If we merge N sorted lists, isn't that going to take 1,000,000,00 / 2 runtime & space? This is assuming the merge is coordinated by one of the machines. I'm comparing this to @annon's approach which would take log(1,000,000,000). It seems we want to avoid an operation that requires the entire data set be iterated (merged) by a single machine. Maybe I missed something though.
- wgestrich August 10, 2013This sounds like a good approach. I'm trying to think about the efficiency.
m= # of machines
n = total numbers ( a billion )
The workers would have to sort all their numbers. This is n*log(n) in total processing time but at the worker level, it is only (n / m ) * log( n / m ), done concurrently.
When the leader asks for the number of strictly bigger numbers, each worker can find the amount larger in log( n / m ) time since the numbers are sorted.
The number of leader requests for strictly greater counts should be log(n) on average. I believe log(n) since this algorithm will eliminate about half the numbers on each request (for similar reason quicksort eliminates half on average for random pivot point).
Summary:
Each worker's sorting -- (n / m ) * log( n / m ).
Each worker's count of bigger: log( n / m)
Number of leader requests for strictly bigger: log(n) on average
Sound right?
Question can be more clear. This is how I interpret it
- You are given choices of different tiles to use of different dimensions and prices.
- The available quantities of each tile are unlimited.
- You are not limited to using the same tile in a configuration. You can use a mix of tile sizes.
-The goal is to tile the entire floor at the minimum cost, without gaps at the walls. The tiles have to fit exactly since you can't cut them.
Performance is linear as it is at least as good as 2n where n is the size of array.
This can be proven by contraction.
Suppose the performance is 3n or worse.
That implies that there is at least some index 'j' that you evaluate 3x or more. (some index you loop over three times)
In order to evaluate 'j' 3 or more times, there must be 3 indices that you jumped from when evaluating 'j'. Let those 3 indices be 'a', 'b' & 'c', encountered in that order.
{ … a … b … c … j ... }
'c' must allow you to jump farther in array than 'b' since you chose 'c' after 'b'.
But 'c' would have been evaluated on both the iteration from 'a' and the iteration from 'b'. This must be true because the index of 'j' is greater than 'a', 'b' and 'c'. Since you encountered 'j' on all 3 iterations, you certainly encountered 'c' from the first 2 (from a & b)
This is a contradiction because if c gets you farther than b, then you would have chosen c when iterating from a.
The contradiction proves that there is not a j that you evaluate 3 or more times. So the most you evaluate any array index is 2 or less; therefore, the performance is linear O(2n) = O(n)
My implementation of greedy solution suggested by @EOF. Runtime is O(N * L) where N is the size of array and L is the average length of jumps.
#import <stdio.h>
int leastJumps(int *array, int array_length);
int main(){
int jumps[] = {2,5,0,0,0};
int least_Jumps = leastJumps(jumps, 5);
printf("Max jumps: %d\n", least_Jumps);
return 0;
}
int leastJumps(int *array, int array_length){
int jumps = 0; // total of number of jumps to reach end
int curr_index = 0;
while(1){
int max_jump_length = array[curr_index]; // maximum amount you can jump for array[curr_index]
if(curr_index + max_jump_length >= array_length-1){
// Can reach end of array with 1 more jump from this position
jumps++; //jump to end and return
return jumps;
}
int best_index = array[curr_index]; // best index to jump to from array[curr_index]
for(int j = curr_index+1; (j <= curr_index+max_jump_length && j < array_length); j++){
if(array[j] > array[best_index]) best_index = j;
}
if(array[curr_index] == 0){
return -1; // Can't jump to end
}else{
curr_index = best_index; // jump to best index
jumps++;
}
}
}
Just to add some clarity to the question as it wasn't immediately clear to me -- I believe the example dataset {{1,2}, {3,4}, {6}}, represents 3 candy jars. The numbers inside represent candy types of candy. So this could be {{Snickers, Reese',}, {Kit-Kat, 100-Grand}, {Butterfingers}. This looked like a count at first glance which is another problem altogether.
- wgestrich August 05, 2013
This is a great answer. But here's a possible optimization. In the suggested implementation, you are checking for the next index that doesn't match in each iteration, swapping ZERO in that position, then swapping the correct element for ZERO. So there are 2 swaps per mismatch. I believe 2 swaps are not always necessary.
- wgestrich August 17, 2013Instead, you could find ZERO in src and swap the tgt value there. Do this in each iteration. In this case, you are just moving the correct value for ZERO's current position, one swap per incorrectly placed item.
The special case for this is when you swap ZERO into its correct position. In this case, you can't swap the correct number in, because it is already there -- it just happens to be ZERO. Instead, you can swap some other incorrect number into ZERO's position. (just move misplaced number into a different misplaced position -- ZERO's position) In the next iteration after doing this fix, it is guaranteed that you will swap some number into its correct place. So even when this special case occurs, it will still only take 2 swaps to move 1 misplaced number.
I still think your solution is more suitable for answering this question as it is more simple to understand. However, I just wanted to point out that 2 swaps are not necessarily required :)