Can multiple threads improve t

Facebook Interview Question for SDE1s

0

of 0 votes

6
Answers
Can multiple threads improve the time complexity to merge k sorted arrays? If so, how?
- ajay.raj May 18, 2017 in United States | Report Duplicate | Flag | PURGE
Facebook SDE1 Arrays

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

Assuming a multi-core environment in which n is the number of cores/threads
using several threads wouldn't improve the big notation as on a perfect set up it would only provide a speed up of n which is a fixed number. Also, this problem doesn't offer a perfect setup as when k < n you can't use all the available threads/cores. Eventually, k will be less than n as you are merging arrays.
In practice, in a typical architecture, you can achieve important improvements, may be of orders of magnitude, as you can usually manage more efficiently the access to the main memory using several threads/cores.

Edited: The previous solution assumes a real scenario in which the number of threads/cores is less than the number of arrays. The maximum theoretical speed up you can achieve sorting in parallel with n cores/threads is O(n log n) / n => O(log n). As previously stated even in this ideal scenario you can't do that with merge sort as each time you halve in two the number of effective threads. For merge sort in the ideal case yes you can obtain a linear cost O(n log n) / log n = O(n).
@Chris.k It depends on the architecture. But lets assume that reading from memory takes x cycles this number has to be much higher than the cost of performing a comparison. Let's assume that cost is y. You can reserve one core to read from memory and to read each time (x/y) * (n - 1) blocks, in this way for the rest of the (n - 1) cores the access to memory is basically free as when they finish the y comparisons they have a new chuck to compare. Perhaps I am missing something here. What do you think?

- Fernando May 18, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

No they cannot. They will improve the algorithm run-time and not time complexity. This is because time complexity measures number of computations and even if multiple threads run in parallel, the number of computations will remain the same (they will be done in parallel)

- Noobie May 18, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Fernando, how do you think using multiple cores for reading / writing main memory or disc would be beneficial if the bottleneck is clearly reading / writing memory / disc.I can't see that...

- Chris May 18, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@all: time complexity won't change. It's O(n+m) because you need to look at each element at least once. That is, for 2*n data it takes double time, you wont achieve anything better unless you are magician ;-)

What can be improved is the constant factor. Fernando actually pointed that out. But if you are running on a single machine (threads, not processes) your bottleneck is disc I/O or memory access.

So, what we learn? distribute, shard your big data sets and use map/reduce. But Big-O is not changing, it's still O(n) by the way it's as well o(n) and Teta(n) since it's a quite tight bound

- Chris May 19, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This operations is IO bound. Adding more CPU power will not make it faster because it is not the bottleneck.

- Alex May 19, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I first, would have asked the important question: "is k constant?"
If the answer is yes, the answer to the question is NO,
otherwise, the answer to the question is yes, it reduces the complexity from O(nk) to O(n log k)

- ab asudeh May 27, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

according to the master theorem, it reduces to big theta (n)?!
there are some small n log n issues which occur in the real world.

edit: when i say becomes n from n log n,
n is the size of the fully merged array

gee whiz i sure hope you're an avid reader! theres 2 sections...

assumptions:
strategy is a 1 thread can merge a pair of arrays, and thus multiple threads can do some simultaneous merging (and we preserve the sorted quality obviously)
for convenience in thinking, assume k arrays are equal length
assume we have more threads and cpus than we will ever need

=========================
improve the time complexity: just theory section

A single thread does mergesort in n log n
we derived that from the master theorem which DOES account for the halving of array size in the recursion formula.

I would like to account for the changing array size. lets use the master theorem.
(if you ignore changing array size you get O(log n) for magically uniform merges hahaha)

n comparisons, split the thing in half, etc for merge sort.
Instead of T(n) = 2T(n/2) + n with a single thread
since we do it in parallel the split is done simultaneously
I imagine the formula becomes T(n) = T(n/2) + n
b still = 2
a now = 1 from 2

now n^(log_2 1) = 1 which is smaller than f(n) = n.
running time is dominated near the root
we get case 3 of the master theorem
the regularity condition easily holds when a = 1 and b is not negative
log_2 1 = 0, which is smaller than c.

case 3 says the time complexity = big theta (f(n)) = big theta (n)
woah. thats definetly an improvement!

=========================
"if so, how?": an implementation

implementation wise i'd have a queue of arrays that need to get worked on, new arrays fresh from a merge get reinserted. a nice way of dealing with unbounded buffer, producer-consumer situations is to employ a semaphore.

this introduces some rare small O(n log n) situations...

in the super rare case where threads finish their merges simultaneously they will have to insert/pop the queue ONE thread at a time as the semaphore's counter must not have a race condition. There are still O(n log n) times that threads can get blocked from locking

also dammit we need to tell when we're finished.
so then we'd need a counter for how many merges we do in order to deduce that we've finished, which would also be incremented O(n log n) times, and O(n) to actually calculate the number of merges done when complete.

Again these are really small things but they could dominate eventually haha. i'd test that by getting employed and using company infrastructure if it ever practically got that big ;)

- david.sicong.zhou May 19, 2017 | Flag Reply

CareerCup

Facebook Interview Question for SDE1s

Books

Videos

Resume Review

Mock Interviews