Microsoft Interview Question for Software Engineers


Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
1
of 1 vote

external merge sort.
O(logn) passes through file.

- zr.roman January 02, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

External counted bitmap sort, O(n) complexity.

Since these are integers (assumed 32-bit), there are 2 properties can benefit from:
1. The range of values is finite, hence we can use a bitmap to partition the range un sub-ranges.
2. The file size is larger than the possible range of values, which means there are duplicates, which will need counted.

We split the integer values in ranges that, together with the counters, fit in the available memory. Worst case scenario, only 1 value repeated the entire file, so the counter needs to be 64-bit. One record has the counter = 8 bytes, 200MB of memory can contain 2^24 such records, so we split the 32-bit range of values in 2^8=256 ranges.

Scan the original file and save each value in its appropriate temp file, in original order.
For each temp file, in the order of ranges:
Scan the file and increment the counter for each integer read.
Scan the bitmap and append the values to the final file, each repeated the respective count.

using System.IO;

public static void sortIntegerLargeFile(string filePath)
{
  FileStream inStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
  BinaryReader inBinary = new BinaryReader(readStream);
  
  FileStream outTempStreams[256];
  BinaryWriter outTempBinaries[256]; = new BinaryReader(readStream);
  for(int i = 0; i < 256; i++)
  {
    string strTempFilePath = @"C:\temp\sortRange" + str(i);
    outTempStreams[i] = new FileStream(strTempFilePath, FileMode.Create, FileAccess.Write);
    outTempBinaries[i] = new BinaryWriter(outTempStreams[i]);
  }
  
  try
  {
    while(true)
    {
      Int32 intRead = readBinary.ReadInt32();
      int range = intRead >> 24;
      outTempBinaries[range].Write(intRead);
    }
  }
  catch(EndOfStreamException eosx)
  {
  }
  for(int i = 0; i < 256; i++)
  {
    outTempStreams[i].Close();
  }
  
  using(FileStream outStream = new FileStream(@"C:\sortedFile", FileMode.Create, FileAccess.Write))
  {
    BinaryWriter outBinary = new BinaryWriter(outStream);

    for(int iRange = 0; iRange < 256; iRange++)
    {
      string strTempFilePath = @"C:\temp\sortRange" + str(i);
      using(inTempStream = new FileStream(strTempFilePath, FileMode.Open, FileAccess.Read))
      {
        inTempBinary = new BinaryReader(inTempStream);
        try
        {
          while(true)
          {
            Int64[] rangeBitmap = new Int64[16*1024*1024];
            Int32 intRead = readBinary.ReadInt32();
            int index = intRead & 0x00FFFFFF;
            rangeBitmap[index] += 1;
          }
        }
        catch(EndOfStreamException eosx)
        {
        }
      }
      
      for(int iValue = 0; iValue < 0x00FFFFFF; iValue++)
      {
        Int64 sortedValue = (iRange << 24) | iValue;
        for(int count = 0; count < rangeBitmap[iValue]; count++)
        {
          outBinary.Write(sortedValue);
        }
      }
    }
  }
}

- florinb1 January 06, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

External Sorting is the standard way to do this.

First We start by fetch pages of the 50GB array into the memory.
For 200MB RAM, lets consider the page size for the machine to be say 10MB.
# We will fetch the first 10 Pages of the array (100MB of RAM is used)
# perform quick sort on each page
# No we have 10 pages of sorted Integers
# Merge Sort the 10 pages into new page of 100MB (all 200MB used)
# write back to the array the first 10 pages of data
# repeat for the next 10 pages. (50GB/100MB times)

** Now we have an array of 50GB where every 100MB is individually sorted. i.e 512 blocks of sorted integers

# Lets consider 512 as 4 parts of 128 blocks
# fetch first 1MB of 128 blocks of the first part
# Merge sort and keep writing into disk (at a new place)
# Use Double Cache (when 1MB of a block is done, discard it and get the next 1MB)
# Do same to the other three parts.
# Do the another double cached merge sort over these new 4 parts and overwrite on the old array.


This is one way of doing it, any comments on improving this approach is welcomed.

- vsounda1@binghamton.edu January 02, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Create a Binary Search tree. That would be suffice for any memory configurations.

- Ramu Valmiki January 06, 2016 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More