Microsoft Interview Question
Software DevelopersCountry: United States
Interview Type: In-Person
Assumption: "if a repetition, shoud not store it" means only store the number once, if repetition should flag the number as not being printed, below discussion requires double the memory.
1. Option, big-O-optimized version:
- Store the numbers in a bit-vector, that is aprox 1.3 MB of Memory for range 1-10 Mio
since 1 - 10 Mio is given this is O(1) ...
- Printing, iterate over 1..10Mio and print the number if flagged, this is O(1), too
If this fits into memory, is it better than storing a btree or something on disc? It most certainly is, because we insert n times, and printing requires about 160'000 iterations over a 64 bit integer and only needs to drill into the integer if it is not 0. This is all in data and execution cache.
If only a few numbers, like 10000 or so occure before -1, it is slower. Depending on the memory size, we could optimize the small amount of numbers with some sort of tree and throw the tree away as soon as we exceed a certain amount of numbers.
Going to disc is an alternative, there it makes sense to create a number of files with fixed size (according to the available memory), each sorted and merge sort them when printing on the fly. This solution now is O(n) because we create constant size files and sort them which is O(1) due to the constant but we require to traverse each number when printing (whether it was a dublicate or not).
Conclusion:
if n is expected to be > 100'000 (or even >> 10 Mio) and almost uniform: use bitfield if oyu can afford that memory.
if n is >> 10 Mio: use bitfield on disc
if n << 10 Mio: use the merge sort approach on disc (maybe one get's lucky often and the -1 comes before the first file was written...)
Create a array of long so that its size can be 10,000,000. Now for each getNum() value, treat the number as index and put 1 there. check the condition for -1 and exit when getNum() returns it. While writing, just read all values starting from index 0 till last index where value is 1 and print index as expected value. this will have o(n) printing, o(1) inserting and only need o(n) size.
Simplest is storing into a file, and then use standard sorting techniques to sort them later.
Given the digits are fixed, we can create a directory tree with LSB at the top, and MSB in the leaf, while the leaf storing the number of repetitions.
So, 1023456 would be stored as :
/6/5/4/3/2/0/1/[count=1]
while 1024 would be stored as :
/4/2/0/1[count=1]
Now, what happens when we do postorder traversal on the tree with children in sorted order? All one needs to do is to store the path from root, and reverse it, and then we have our answer. For multiple occurrence, simply repeat the same count no of times, and we are good.
The max depth of that tree would be 8 ish in base 10.
You can use binary search and insertion for each element generated. The O(log n) will enable you for minimum memory utilization assuming only half array will be in memory for each computation. We can publish the array to the disk. Keeping the array in sorted order can help in generating a stream for the same and printing while fetching from any disk storage.
A trie is perfectly suited for this. A trie by definition does not contain duplicates.
10 Million is 10,000,000. The max depth is 8.
Once the trie is built, the numbers can be constructed in sorted order by doing a level by level left to right traversal appending the value at each node to the parent.
when we know the number range from 1 to 1M in radom fashion with repetions (those repetions we can ignore so we can just say numbers between 1 to 1M it only prints -1 when out of numbers means all numbers will be printed at least once). SO just print the numers in or loop from 1 to M, the moment we receive a -1.
Use BitArray, because of the limited memory requirement
- vh.vahan January 06, 2017