Google Interview Question for Software Engineer / Developers


Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
5
of 5 vote

The problem of generating unique IDs has been solved. All modern OS'es have that functionality built in. (And many embedded/mobile OS'es provide API's as well.)
And if asked: UUIDs may be generated from a combination of system time stamp, CPU / system unique ID, NIC MAC addresses, HBA WWNs etc.
The problem is how to serve lots of such IDs however generated.
Questions to ask:
What's the min length of the ID?
The requirement states the system should handle 1000 request/s at least.
What's the average request rate?
What's the max. burst rate the system must handle?
What's the max. latency (wait time) for a request?

Let's assume the following parameters:
What's the min length of the ID? <= 128 bits
In that case I'll choose the standard 128 bit UUID format.
The requirement states the system should handle 1000 request/s at least.
What's the average request rate? 1000 < avg req. rate < 10,000
What's the max. burst rate the system must handle? < 1,000,000
What's the max. latency (wait time) for a request? 1 ms

It's a classical consumer-producer problem.
In this case we have many consumers of UUIDs and one producer.
All common server OS'es have API's to create UUIDs. For embedded/mobile systems one may have to build the functions. Let's assume the OS provides an API to generate UUIDs and the max. running time of the API is 1ms.
First Pre-allocate 2 Mio UUIDs into an array / stack/ heap (depending on implementation) structure.
2 Mio UUIDs ensures system can handle burst rate.
(If no overhead 2 Mio UUIDs would take ~ 32MB of RAM)
Again, not a problem on today's server system with many GB of RAM, but may be a problem on an embedded system.)
The solution needs to ensure that its pool of UUIDs doesn't run out as consumers request them and they are replenished.

- McGee December 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

UUID are not ordered, i.e. given 2 UUIDs it's impossible to say which one got generated first. So while UUID will give you what it means, unique ID, the answer doesn't satisfy conditions - ever increasing value.

- DS January 12, 2014 | Flag
Comment hidden because of low score. Click to expand.
4
of 4 vote

This question is quite open-ended.

For starter, how about appending random(N) to the timestamp? N can be large enough to make collisions unlikely. If each server has an ID we can also include it to further reduce collisions.

If IDs must be unique, then I suppose you can design a counter that will return an ID and increment it at the same time. You will then need mutex to access this counter.

- Sunny December 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Good one. A simple Hashmap with timestamp as key and an atomic counter as value. Assuming 1000 sec per sec - on an avg 1 req per ms. Standard timestamps can easily be generated upto ms - hence least collision. Even if there are more than one req per ms, mutex on counter of that timestamp as key for just 2-3 values will be very cheap computation-wise.

- Roxanne December 11, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

using random seems to be quite good but the problem with it is that for example in Java if you create 2 Random objects with the same seed, they will generate the same value. If you don't provide a seed, it will take the number of milliseconds which is actually the same in our case (requests happend in the same millisecond). Otherwise you would have to synchronize on random which seems to be an overhead.

- Anonymous December 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
2
of 2 vote

Requirements:

-> Unique Incrementing Id
-> Scalable
-> Available
-> Consistent

Solution:

First of all lets determine a solution that allows you to get the best unique id that starts incrementing from 0. We can use the time stamp and convert to epoch to get a unique id.
When we start this system we set a variable called START_EPOCH_TIME.

During each request we calculate CURRENT_EPOCH_TIME - START_EPOCH_TIME.

This should allow of us to get unique incrementing ids over the scale of the number of digits we use.

However, this won't work when there are multiple requests within the same time stamp. So first thing is we modify the length of our id by the number of requests we want per second. i.e. ID000 for 1000. So what we do is maintain a hash table that we recycle on regular intervals that maps epoch time and count of the number of requests. We update the table on every hit of the request.

REQ_COUNT [CURRENT_EPOCH_TIME] += 1

Our unique ID is then ID + str(REQ_COUNT [CURRENT_EPOCH_TIME]) .

If this is the only machine that handles requests then this shall work, however, this may not scale well when there are multiple machines handling requests. So lets say we have 10 machines handling requests. Then we use gossiping protocol to ensure that all the machines know about each other. This way we ensure that each machine knows exactly the number of machines that exist. Using this knowledge each machine assigns itself a range ordered by the machines IP/NAME. So for 10 machines. Machine 1 gets 0-100, 2 gets 100-200, 3 gets 200-300.

Then the hash table will be initialized with:
REQ_COUNT [CURRENT_EPOCH_TIME] = RANGE_START

And incremented as such.
REQ_COUNT [CURRENT_EPOCH_TIME] += 1.

Finally we always check whether REQ_COUNT [CURRENT_EPOCH_TIME] > RANGE. If so then we throttle the request. Otherwise if we want to handle more we need to update the way the system works. This should be decided based on the requirements and capacity of the system in handling these requests.

- Karan Shah November 02, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

I think this question check if you could use thread synchronization.
1. Create thread for each link.
2. Create a variable for record ID, after there is thread read then do self-increasing. And set thread lock for it to avoid access conflict.

- jerry.a.yin December 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

In theory it will work if you are assuming it's all run in the same process space. However 1000 rps is 1ms per request. Thread synchronization will not keep up with such speeds (acquiring and releasing locks is expensive and usually takes longer than 1 ms). Since if you have to spawn multiple processes, the model you presented breaks.

- DS January 13, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Typically a server sends back unique id for initial request (initial session establishment request). it depends on server design how a connection is accepted. typically acceptor is a single thread which then passes an accepted connection to workers. a unique ID can be composed of - source IP, source port, destination IP, destination port and accepted timestamp

- confused_banda December 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
2
of 2 votes

You are confused.

- Anonymous December 04, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Suppose there are 2^m number of servers. Each request is forwarded to a random server. Each server has a local ID from 0 to 2^n-1. The m bits server ID and n bits local ID is the complete ID of each request.

- Jason December 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

It's still not possible to tell what request came first.

- DS January 13, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

a unique id can be composed of as {source-ip, source-port, dest-ip, dest-port, protocol}

we can add timestamp also along with this request.

- Lokesh Sharma December 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

How about using good old AtomicInteger (except if you are afraid of ID wrap-arounds)?

- RaeR0aih December 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

what about generating the id with timestamp and id of thread which is handling this request?

- GaaraZhu December 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

If the timestamp is only granular to say the "seconds" level, then it's possible that the same thread handles multiple requests within a second and thus they all have the same id.

- Sunny December 04, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

timestamp / n, where n between 1 and the # of requests per second. n gets reset to 1 when timestamp changes

- brian December 04, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Twitter has built a service called 'snowflake' to solve this precise problem. There's a very good explanation in the readme section at -
github.com/twitter/snowflake/

- Anonymous December 07, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

If the request ID is not required to be consecutive, a easy way can generate the unique request ID with zero collision.

If we have multi threads to generate the ID. we can give every thread a distinct ID. It is easy to implement, but you cannot use the thread id directly.

After the ID is assigned to thread. each thread can give a request a local ID. the global unique request ID is formed by TimeStamp+TheadID+LocalID.

- Anonymous December 16, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

concatenate(time_stamp+ request_source_specs + request_handler_machine_thread_specs) . We may implement a queuing mechanism which ensures every request goes one after another and use simple counter for unique id generation, however it may become performance bottleneck.

- Vaibhav Gorde December 16, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

concatenate(time_stamp+ request_source_specs + request_handler_machine_thread_specs) . We may implement a queuing mechanism which ensures every request goes one after another and use simple counter for unique id generation, however it may become performance bottleneck.

- Vaibhav Gorde December 16, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 2 vote

import java.util.concurrent.atomic.AtomicLong;

public class UniqueIDGenerator {

	static final int bufferSize = 100;

	AtomicLong base = new AtomicLong();
	
	volatile long[] buffer = new long[bufferSize];

	volatile int bufferIndex = 0;

	UniqueIDGenerator() {
		init();
	}

	long getUniqueID() {
		if (bufferIndex < buffer.length) {
			return buffer[bufferIndex++];
		}
		return base.getAndIncrement();
	}

	void init() {
		for (int i = 0; i < buffer.length; i++) {
			buffer[i] = getUniqueID();
		}
	}
}

- Serdar January 18, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

It's just a buffering problem like online streaming. The time required to generate unique ids may be more than per request time. So if we know request rate we can learn what is the gap, and how many unique ids required to fillup that gap. Then we can use a buffer to make generate unique ids ahead, and give ids from that buffer, at the same time we continue generating ids and storing on buffer.

- hirajhil January 29, 2014 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More