## Amazon Interview Question

Software Engineer / Developers**Country:**United States

The most efficient sorting algorithm has O(nlogn) time complexity where, for this problem, n is big.

In my above algorithm, the trie structure can be built by traversing the list only once --> O(n) time complexity.

Then:

a. if you wanna find if a number is present or not in the list, it will take you at most 8 (64 / 8) comparisons.

b. if you wanna just return an arbitrary number which is not in the list then you can go and check the number of children for each node and pick the missing child. But, in order to avoid the cases in which you may have an "almost complete trie structure" where there is only a number missing and this number is something like 11111111 --> 11111111 --> 11111111 --> 11111111 --> 11111111 --> 11111111 --> 11111111 --> 11111110 (this no. will appear in the trie in the far right hand side and will force you to do a lot of comparisons in order to reach it), you can do an artifice and mark a node as complete once the sub-trie having this node as root is complete. In this case you will know directly where to go in the trie to pick a number which does not appear in the initial list.

If the list contains all 64-bit numbers except only ONE number then XOR all the numbers will give the missing number.

Seiously, two upvotes to this answer?

Do you guys even know how much time it will take to XOR all the (2^64 -1) numbers?

Also, the question is more general. It is a set of 64 bit numbers. That is it. Not necessarily all but one.

Idiots.

Dude bit operations are anyways faster, and more importantly, in any soln. other than the XOR one u will have to travel thru all values and parse them. So I don't think that any other soln will be extremely less in time complexity than the XOR one.

doesn't seem to be working all the time..

long a[] = { 1, 3, 6, 2, 8, 0, 14 };

xor all the elements gives 0.. which is already part of array..

I thnk using XOR is a good choice. Because it IS a pseudo random number generator, which can be used to generate and check if it already exists.NOR is another alternative..

What if we XOR all the elements from 0 to 2^64 ( in CPU by just incrementing the counter ); like following,

```
unsigned int i = 0;
unsigned int x = 0;
do {
x = x ^ i;
i++;
} while (i != 0)
```

and then we read each element from the list one by one and xor with the 'x' in the previous code.

Using this we will map this problem to each element is repeated twice except one.

The question didn't specify missing only 1 specific number, and I think the main point is there may not be enough memory to do sorting of all the data at once.

We need to how much memory we have to perform operations, let's say, if we have 2MB of memory (2^21 Bytes), then we can create an array of 2^16 of type long, and count the occurence of the first 16 bits (using bit-wise & operation) of those 64 bit numbers. O(n)

Then we could pick the number with the least occurrence ( gaurantee to have missing number ), then from those numbers, we pick the 17-32 bits and do the same thing. O(n)

Until the last 16 bits, we could have a 2^16 array of bool, to hash with the last 16 bits of those picked set of numbers, to find some missing bits. O(n)

Then we could combine all the 4 16bits segment to get a missing element

If only one number is missing... it can be easily found...

make a structure with

{position, no of occurring of 0,no of occurring of 1) in binary format of number.

for 3 bit number in binary format we have 3 position's

if all elements are present

values will be (1,4,4) (2,4,4) (3,4,4)

suppose from 0 to 7 6 is missing...

the structure will result as

(1P,4,3) (2P,4,3) (3P,3,4)

that mean binary digit 1 missing from index 1, 1 missing from index 2 and 0 is missing from index 3

possible missing number will be 110 that is 6.

this might be generalized in case of more than 1 number is missing..

The idea is to use binary search. Rough sketch follows.

1. Start with high as INT64_MAX (the maximum number that can be represented in 64 bits - signed) and low as INT64_MIN. Now, the mid element will be 0.

2. Go through the elements and count which are < mid and > mid.

3. If if the numbers < mid are more than the count of numbers > mid, then low = mid, else high=mid.

Progressing like this, we will get a number which does not exist in the list.

This will have O(64*n) = O(n)

Hi guys, I'm thinking of using a B+ tree that each node in the tree will have two values (call it lowerbound and upperbound) indicating the range that node will cover

when reading an input, insert it to the proper location in a B+ tree, if the difference between the newly inserted node and its neighboring node(s) is one, delete this node, and merge it with its neighbor by updating its neighboring node's lowerbound or upperbound depending on the inserted node's position.

in this way we don't need to store all numbers of the unsorted list. in the case if we only have one number missing, the size of the b+ tree will be increasing at first, but will shrink down eventually to 1 node (if the missing number if 0 or 2^64-1) or 2 nodes (if the missing number is any number between 0 and 2^63-1)

take an array of size 64 to calculate difference in number of 1's and 0's at a particular bit position.(ideally 2^64 numbers have 2^63 1's and same numbers of 0'2 at each bit position).So bring some numbers in memory and calculate difference for each postion. after u are done with all the numbers what u will have is an array of size 64 having either -1 or +1 at each index so let's assume we are taking difference of (numbers of 0s') - (numbers of 1s') then wherever it is -1 put 0 at that position and wherever it is 1 put a 1 at that position and finally u will get binary representation of that missing number.

64 bit means its a long type number . we can say set consists of long integers.

we can use bit array.

take bit array, initiallise with 0

traverse each element once in the set and set 1 in bit array.

traverse the bitarray again and find bitarray index set to 0

that will be required number

i think it is not possible in less than O(N)

nice thing u pointed out

it requires an array of 2^59 size which may not be posssible.

so how it is possible in o(N) can any body tell

If only one number is missing... it can be easily found...

make a structure with

{position, no of occurring of 0,no of occurring of 1) in binary format of number.

for 3 bit number in binary format we have 3 position's

if all elements are present

values will be (1,4,4) (2,4,4) (3,4,4)

suppose from 0 to 7 6 is missing...

the structure will result as

(1P,4,3) (2P,4,3) (3P,3,4)

that mean binary digit 1 missing from index 1, 1 missing from index 2 and 0 is missing from index 3

possible missing number will be 110 that is 6.

this might be generalized in case of more than 1 number is missing..

Find max or min number and add or sub 1 from that. Check for overflow or underflow cases

Guys relax,

If there is only one missing integer then 2^64 - sumOf (givenArray) will solve the problem.

Use binary search.

Let say max number is N = 2^64.

Then set pivot as p = N/2,

Partition the list with first part as number <= p, second part as >=p;

if(#number in fist part < p) then do search first part

else search second part.

for each partition will be O(n), and will be O(n*lgN), which will be O(n*64)=O(n) in worst case

you bunch of idiots, by saying that the list s very big he is implying that the list is not to be traversed...

more importantly it is a behavioral question, the answer can be any number greater then 2^64 that can never reside in a list containing 64 bit integer list..

so the number can be 2^64+1 or 2^64 and so on....

hey guys,

- thunder.thd September 16, 2011What about using a trie structure to represent those numbers? In which you have each number split in 8 groups of 8 bits (or lower no. of bits).

So, for ex. 123456789123456789 (00000000 11011011 01001101 10100101 11010110 01101000 00101111 100010101) will be represented by the path: 00000000 --> 11011011 --> 01001101 --> 10100101 --> 11010110 --> 01101000 --> 00101111 --> 100010101 in the trie structure.

After the entire trie structure was built it will much easier to find a number which is not in the list.

If the list is too big to fit into memory, we can store the trie structure on the disk.

This is how Lucene stores the numeric values in its index for a fast search.