Amazon Interview Question for Research Scientists


Country: United States
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
1
of 1 vote

Using a Binary Search tree can solve this problem. While traversing the BST we can continue to track the "Minumum difference" found till the current Node and return the "Global minumum" found at the end.
Time O(lg n) and constant space.

public float findClosest(float element) throws IllegalArgumentException{
		if (head == null){
			throw new IllegalArgumentException();
		}
		Node traversePointer = head; 
		float min = Math.abs(head.value - element);
		float minElement = traversePointer.value;
		
		//Traverse through all the nodes and find out
		while (traversePointer != null ) {
			if (min > Math.abs(traversePointer.value - element)) {
				min = Math.abs(traversePointer.value - element);
				minElement = traversePointer.value;
				if (min == 0) {
					return minElement;
				}
			}
			
			if (traversePointer.value > element) {
				traversePointer = traversePointer.left;
			}else {
				traversePointer = traversePointer.right;
			}
			
			
		}
		return minElement;
	}

- successboy123 September 25, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

I missed to add. The reviewer further mentioned that I can not sort and that I can use any amount of time for creating the data structure.

Since, I am not allowed to 'sort', I assumed that I can not use BST as I will have to compare numbers while populating the tree. But I didn't clarify it with him; I should have in hindsight.

- Anonymous September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

please edit your post then, adding that information. but that "sort" constraint is weird

- Miguel Oliveira September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Are you sure this BST approach always work?

how you are constructing your BST to be sure search time is O(logn) (balanced BST).
I suppose you need to sort the given set and then build the BST to get O(logn) search time.
If we are ignoring this fact, then yes BST approach will give the correct and optimized solution.

- varun October 08, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Its a very interesting question I suppose. Now I think the answer is very straight forward unless I am totally mistaken. My code follows:

void printNearest()
{
	float modulo = -1; //Init to -1 for starter
	float array[MAX_SIZE]; // MAX_SIZE = total number of floats
	float x; // Given number 	

	loadFloats(array); // Load all the floating point numbers

	List<float> closestNums = new ArrayList<>(); // Java 7 style	
	for(int i=0; i<array.length; i++)
	{
		float num = array[i];
	        float moduloDiff = (num-x)<=0 ? -(num-x) : (num-x);
		if(modulo == -1)
		{
			modulo = moduloDiff;
			closestNums.add(num);
		} 
		else if(modulo > moduloDiff)
		{
			modulo = moduloDiff;
			closestNums.clear();	
			closestNums.add(num);
		}
		else if(modulo == moduloDiff)
		{
			closestNums.add(num);
		}
		else
		{
			continue; //Ignore the number, it is not near
		}
	}
	
	printAllNums(closetNums); //Print all the numbers

}

Furthermore since java flots & decimals are not very accurate I would adopt BigDecimal if that becomes an issue.
All comments are very welcome.

- Itanium September 25, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Like you said, that's the very straightforward way to solve the question, answering each query of a point X in O(N) time.

This kind of question ("goal is to efficiently find the number that is closest to 'x' from the fixed set", "what data structure will you actually use for storing the fixed set of floating point numbers to achieve this?") implies that we need to do better than the trivial approach.

- Miguel Oliveira September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Of course this is brute force search, and it has the worst case complexity of O(n) i.e. linear in the total floating point numbers in the set. In contrast BST would be O(log n) and the trie approach would be O(m) where m is the length of the 'longest float in n'.

- b2010 September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I think trade off is the key point here. Why do we want a memory intensive data structure approach if the given float number set is small enough/not too big? This would more than suffice. However if you are talking about scaling then I challenge all the BST & other k-dimensional tree would choke the memory axis eventhough it may 'theoretically' fare better than this brute force approach on time axis...

- Itanium September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

I guess in the above code the bulkiest part is the ArrayList. I am trying to explore growable arrays, but not very sure how it works...

- Itanium September 25, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think the key point is clarification about what could be considered as 'sort', in the context of the problem.

IMO, using a BST is not sorting per se. It's more like structuring the data for searching optimization.

- Vic September 25, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Yes, I agree to some extent. The BST indexing could be interpreted in either way. As I mentioned earlier, I should have clarified this point with him, but that was only in the hindsight.

- b2010 September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

@Itanium, Well I really can't argue with him in the interview as to, what the ideal solution *should* be. My job in the interview is to try and come up with some solution that also satisfies the constraints. And I believe he was definitely looking for an efficient solution that would have smaller time complexity.

- b2010 September 25, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Not accusing you but you never mentioned time only efficiency in your question. If you just say efficiency it refers to product of Space & Time efficiencies.
i.e. Efficiency = SxT

And therefore we look for a tradeoff according to the given situation.

- Itanium September 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
1
of 3 votes

@Itanium: Does it? I have never heard anyone define "efficiency" as space complexity * time complexity.

- eugene.yarovoi September 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@Eugene.Yaraovi
Why do you think space is so subordinate to time? Had space not been a serious performance metric, sorting algorithms like external sort that operate on huge files would not have been useful at all.

- Itanium September 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

To clarify I meant to say Efficiency is proportional to S x T & not exactly equal..... I admit that was a mistake.... Its not really a textbook definition.

- Itanium September 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
-1
of 1 vote

@Itanium: No part of my comment says or implies that space is subordinate to time. I was responding to this part of your post:

"If you just say efficiency it refers to product of Space & Time efficiencies.
i.e. Efficiency = SxT"

You're presenting it as if it were a common definition. I was saying that, as far as I know, this definition is completely made up, and it doesn't exist as a recognized concept. Nor is defining it as "proportional to" space * time a recognized definition.

- eugene.yarovoi September 27, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

@successboy123, Good point. However, all I have to do to avoid this issue is to maintain multiple search paths until each one of them can be *safely* deleted.

Thus, in the present case, I'll include paths "1-4-." and "1-5-." and when I reach the next set of nodes I can delete the path "1-4-.-0" and only retain "1-5-.0". And if there was as 13.99, I would also retain "1-3-." until it reaches the next digit, for the given query 14.99. Isn't this correct?

- b2010 September 26, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

if you see my reply to your post above, i think that we'll end up following almost all paths. Thus it won't be much better than the O(n) linear scan.

- Miguel Oliveira September 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think we should not look for 0(n) scan...because for 0(n) we can store the elements in normal array and can get the element in just 0(n) time...We can do it in BST with logn complexity..

- Dinesh Pant September 26, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Hi,

obviously you can use BST and then you may try to find floor/ceil (google it for BST) of the given value and choose what is closer to the given value.

- joe kidd September 26, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

why to use any data structure, a normal loop can do ,here is a simple example ,

tell me if i'm wrong...

#include<stdio.h>
#include<math.h>

     void main(){

        float a[5] = {123.4, 123.46 , 127.65 ,123.45, 120.34  };
        int i = 0;
        int j = 0;
        float num = 123.00;
        float temp = fabs(num-a[0]);

         for(i = 0; i< 5 ;i++){
               if(fabs(num-a[i]) <= temp){
                   j = i;
                   temp = fabs(num-a[i]) ;
                 }

          } 
           printf("%f\n",a[j]);

     }

- sourcecode September 26, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

the "why" was already answered before. check the other posts

- Miguel Oliveira September 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Why would you want to sort / BST? That takes O(n ln n). A simple linear scan is easy and takes O(n).

Got to be honest, don't really see how this is an interesting question!

- Anon September 26, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

BST would be O(log n), not O(n log n). Also pls read the previous answers and discussions.

- b2010 September 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

It seems that many people don't read the comments. Maybe you could edit the question and say explicitly that we have many queries and we want better than O(n) per query

- Miguel Oliveira September 26, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Just a thought. Why not use hash map? Key as floating numbers and value as "difference between floating number and x. find the key with minimum diff .

- ashi September 29, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

The idea is to have a fixed set of N points and then have multiple queries with different values of X. So, that won't help

- Miguel Oliveira September 29, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

You can probably use a consistent hash ring (circular hashing).

- noisebar October 02, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Since the floating numbers are given and storing the numbers is not a problem.
So we store all the floating numbers in Trie.

Searching the closest number will take length of the floating number to be searched.

- Psycho October 02, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think we can use a hash function for this. The important point is that it is a floating point. So, first write the floating points a exponent functions (e.g; 1003 = 1.003 * 10^{3}). Then uses nested hashmaps (e.g hashmap(integer*exponent, <hashmap(first_decimal, <haspmap(second_decimal,...)>)>). We can do so until we reach a predetermined depth, and just use brute force on the rest.

Complexity for searching O(M+L); where 'M' is the depth and 'L' is the length of the list of number at the end to search from.

- rchala December 13, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

heap should the right data structure to go

- Mo June 30, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

There are many possible options - en.wikipedia.org/wiki/Nearest_neighbor_search

I think we should go for a Space Partitioning method.

I would pick a Kd-Tree (where K is the number of dimensions of the points). This would allow us to answer each query of a point X in O(log N) time, where N is the number of points of the fixed set.

- Miguel Oliveira September 25, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Thanks for the answers. Even the k-d tree or for that matter any space partitioning method requires comparison to identify their order as to which goes to the left and which one to the right. Correct my if I am wrong.

My own answer during the interview was bit unconventional (I am not sure if the interviewer was convinced though).

I opted for a trie data structure for storing the set of floating point numbers, where each node would either store a digit or a decimal point. Thus the height of the trie would be length of the longest floating point number. The traversal at each node would involve finding the node that matches the corresponding digit in the number being tested or finding nodes that are at same distance. I will have to maintain multiple paths during the search until I can safely eliminate all but the closest path.

The problem in this is that in order to get the individual digits, I need to convert the number to string and back to number. While this is not an issue for populating the trie as I don't have to worry about the efficiency here, it will be an issue during retrieval.

Any thoughts about this approach?

Of course, BST or k-d tree would have been straight forward answers if 'sorting' constraint was not there.

- b2010 September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I'm not sure that the trie approach ends up being better than a linear scan through the N points. The trie stores the most significant digits first and we don't know the order of magnitude of the points represented in the children of a node. We can have numbers with 3, 4, 10 digits, so we end up traversing all points (we have little potential to cut the search).

The sorting constraint is a bit weird to me. I would like to clarify it a bit better with the interviewer.

If we completely ignore sorting, maybe we should go for a sort of Locality Sensitive Hashing. The goal of the data structure would be to group the points in buckets of points close to each other.
At this point, I would also ask the interviewer if we need 100% correctness or if an approximate answer is ok.
Let's say we group the points in O(sqrt(n)) buckets. To answer a query, we could pick a random sample of each bucket and find the bucket closest to point X. The 100% correctness part is relevant here. Since we group closest points together, we expect that the selected bucket contains the closest point (although it can be wrong a few times).
Then, we could brute force the elements of this bucket to find the closest point to X, yielding O(sqrt(n)) per query.

We can try to have O(log(n)) buckets but the sorting constraint makes it hard to have a good performance checking the n/log(n) elements of a bucket.

Still, i think that grouping the points into proper balanced buckets can be quite hard. In an interview I would go for a straightforward approach. First by going through each point and selecting the ~sqrt(n) available points closest to it. Or trying to binary search the maximum distance between points in a bucket and check how many buckets a given threshold yields.

What do you think?

- Miguel Oliveira September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I don't know if the Trie data Structure would work. For example lets assume that the Trie contains "14.01" and "15.01" and we are searching for "14.99". In the Trie we would proceed as follows "1" -> "4" -> "." Then the only subtree is "01" and Hence the result we get is "14.01" while in reality "14.99" is closer to "15.01".

- successboy123 September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Reproducing what I said in other post below:

I think trade off is the key point here. Why do we want a memory intensive data structure approach if the given float number set is small enough/not too big? This would more than suffice. However if you are talking about scaling then I challenge all the BST & other k-dimensional tree would choke the memory axis eventhough it may 'theoretically' fare better than this brute force approach on time axis...

- Itanium September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Itanium, you're missing the point of the question. We have a fixed set of points, but we want to do several queries with different points X. The trivial approach takes O(Q*N), but we would like it to be O(Q*log(N)) or O(Q*sqrt(N)) at least.

Now i also see that i misread the question. I thought we were talking about 2d or 3d points, not just an array of floats. I still don't see a better approach avoiding sorting.

- Miguel Oliveira September 25, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@Miguel,
I see what you are saying, but don't you think heavy data structures would also degrade performance during high frequency invocations?

- Itanium September 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

While the constant factor might be big for certain data structures, it's still quite worth it due to the O(n) vs O(log(n)) time.

In this case, the data structure does not need to change between queries, I don't see any reason for the performance to degrade.

- Miguel Oliveira September 26, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@successboy123 on September 25, 2013 :

struct trie_node
{
bool isEnd; // Used to mark leaf nodes
trie_node_t *children[ALPHABET_SIZE];
bool isDecimalEnd; // marked when a decimal ends
};

Here we can detect 14.99 closer to 15.01

- Psycho October 02, 2013 | Flag


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More