Google Interview Question for Software Developers


Team: Developer Tools
Country: United States
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
3
of 3 vote

Sum is write-only var. Order of writing don't matter. You can send jobs to several multithreading servers and accomulate result. Divide and Conquer multithreading optimization.

- lalalka March 25, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
2
of 2 vote

For loop will be blocked at each sumOfFile call, where it could have been processed in parallel. So that the next file can be processed without waiting for the sumOfFile call to complete for the current file.
Instead of holding the entire file in the memory, read the file in segments and simultaneously do the summation till you meet the end.
Once you're out of threads in thread pool, the processing of the remaining files will wait till someone gets free to pick up the next file.

- Antriksh March 25, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

Multiple issues:
1) Performance - assuming 1M numbers to be added take 10ms. Doing sequential summing on a single machines means 10000*10=100s. Too bad.
2) Overflow - Here total are 10Bill integers, cant bank on max int limit of ~2.15Bill. Long is safer @ max of 9.2BillBill. Best is go for BigInteger. Can also go with long as it can be faster....so decide the trade off based on data range.
3) Beware of OOM due to heap size limits. With ints, each data set is 4MB+(4B per int + object overhead), with longs ~16MB . For BigInteger plan more, as it internally uses int[].

So assuming max heap configured at 8GB, assuming 50% is available for you ...so you can process (4GB/16MB) ~250 files per node in a multi-threaded implementation, needed 40 nodes.
Assuming each node takes 20-30ms to do the job. Have workers/jobs to delegate job to 20 machines with file paths(each machine can read the target files directly from a central mount location). Kick them off in parallel. Sum up the 40 results...each taking 20-30ms each... account for say 2-5ms of network latency...you can get final answer in ~35-40ms.
Optimize further as per your perf SLA.

Comments welcome!!

- gg_poks June 11, 2018 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Multiple issues:
1) Performance - assuming 1M numbers to be added take 10ms. Doing sequential summing on a single machines means 10000*10=100s. too bad.

- Anonymous June 11, 2018 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Lets assume that we have 1 Node with 1 GB of Main Memory -
--> Suppose lets say we have just one file with a million integers. We cant load all the integers to main memory as this would lead to memory overflow. Each interger is 4 bytes , so we can read around 250000 integers each time and compute their sum. We need to store the sum as string to reduce integer overflow. Now to fasten this up we can use multithreading. We can compute the [0..62,500] integers in one thread and [62,501..125000] in second thread and so on.
--> Lets assume that we have 10 Nodes each with 1GB of Main Memory
We can split the files equally among the 10 Nodes and run the algorithm mentioned above.
In this case each Node would get 100 files. After the sum computation is done across the 10 Nodes the final sum can be calulated by adding the total_sum across the nodes.

- Adarsh May 10, 2020 | Flag Reply
Comment hidden because of low score. Click to expand.
-2
of 2 vote

/* 
1. bucketize on thread pool.
2. Shoot.
*/

// per thread num of file 
bucket_size = 100 
// assume list of files are here 
files // yep, files as in like list of strings 
num_files = size(files)
def process_files ( start_inx ){
  end_inx = start_inx + bucket_size 
  end_inx = end_inx > num_files ? num_files : end_inx
  sum ( [start_inx : end_inx ] ) -> { sumOfFile( files[$.o] ) }
} 
num_buckets = ceil ( num_files / bucket_size )
start_inx = 0 
threads = list ( [0:num_buckets] ) -> {
  start_inx += bucket_size
  thread( ) -> { process_files( start_inx ) }
}
// do not get into a loop, ever 
succeeded = poll( 42, 300 ) :: { !exists( threads ) :: {  $.o.alive } }
// zoomba threads return value, proper they are option monad 
assert ( succeeded , "Why you fail in timeout???" )
total_sum = sum ( threads ) -> { $.o.value }
println( total_sum ) // should be cool

- NoOne March 26, 2017 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More