sambatyon
BAN USERI don't think this works, I don't even think the solution you gave is the right one. The weight of 40 should be its weight, plus the one of its subtree, so w(40)+w(50)=3+4=7. Following this train of thought, the solution would be:
node 0: tree weight is 20
node 30: tree weight is 17
node 40: tree weight is 7
What I would do, since clearly I cannot load both files in memory (probably not even each alone). I would still sort them (External merge sort, check wikipedia for that). Then I will load as much from each file as I can put into memory, for example 512kb from each file, and perform a merge sort where I only keep the last element and the list where it came from, if the came from different lists, I add it to my result set. When a chunk is over, I load the next chunk.
- sambatyon December 05, 2013We had this problem at the office. The best way to do it is to order your nodes as a binary tree (a heap representation will do)
Node1
Node2 Node3
Node4 Node5 Node6 Node7
at T-0 Node1 pases the data to Node2, at T-1 Node1 passes data to Node3 and Node2 passes data to Node4, at T-3 Node2 passes data to Node5, Node3 passes data to Node6, at T-4 Node3 passes data to Node7.
There's even an optimization in which not all data is passed in each step but just chucks, and when each node has a chunk, they start passing them to the next node. That reduces vastly the amount of data passed in the network at each step reducing congestion and augmenting throughput. The only problem is I cannot remember the optimization but is based on how OpenMPI implements MPI_Bcast.
What a headache, but I managed it. It returns an integer where it's bits correspond to the base -2 encoding, it does it in O(log(n)) where n is the position of the highes bit:
import math
def convert(x):
res = 0
while True:
sign = 1 if x >= 0 else -1
p = math.floor(math.log2(abs(x)))
if 2**p < abs(x):
p += 1
if (p % 2 == 0 and sign < 0) or (p % 2 == 1 and sign > 0):
res += 2**(p+1)
x = -(sign*(2**p) - x)
res += 2**p
if x == 0:
break
return res
It assumes that probs[i] is the probability of getting strings[i], which means probs.size() == strings.size() and sum(probs) = 1.0
std::string GetRandomString(std::vector<std::string> strings, std::vector<double> probs) {
std::vector<double> density;
density.push_back(probs[0]);
for (int i = 1; i < probs.size(); ++i)
density.push_back(probs[i]+density.back());
auto r = random.NextFloat();
int index = 0;
while (r > density[index])
++index;
return strings[index];
}
I assume that each number in k is at most once, that size of k < n. The idea is to return the i-th allowed number where i is a random number in the interval 0, n - k.size().
int RandomNumber(const int &n, const std::vector<int> &k) {
int res = rand(n - k.size());
int index = 0;
while (index < k.size() && res >= k[index]) {
++res;
++index;
}
return res;
}
I got asked this question, after a long time tracking minimums and maximums, the interviewer asked me why couldn't I consider something with in order traversal. Then I felt really stupid.
- sambatyon December 09, 2013