Microsoft Interview Question
I think one key point was missed over here. What if one of the files has the first word = 1 GB. Will you read the whole 1 GB into memory?
My guess is that this will take (O(n*c) + O(nlogn)) (where n = number of files in the directory and c = number of characters that you will have to read inorder to do the sort). What do you all think?
I can send a pseudo algo if anyone is interested.
Pseudo Code:
Let n be the number of files
Let c be the total characters read to sort all the files
Data structures:
1. Array of n strings for storing the strings
2. Array of n for length of each string
1. For each file read character from the file and append it to the existing string. O(n * c)
2. Convert the string to a number. For example if the string is abc then the corresponding number is (a * 100 + b * 10 + c). Keep another corresponding array for the length of each string. O(n * c)
3. Sort the numbers using quick sort. If 2 numbers are equal compare their corresponding string lengths. O(nlogn) (Note: this is independent of the length of the string)
4. Check for duplicates in the sorted array O(n). If there are, then repeat steps 1 - 4
There are some particular cases that need to handled like checking for EOF etc when reading from the file but this is the rough logic.
Does this make sense?
Iterate over all files in the directory and Create an array A of strings as follows:
- N/A October 27, 2006<First word of file> + ":" + <Filename>
Sort Array A
Iterate through the array, parse out the filename and print it
Time Complexity: n + nlogn + n where 'n' is the number of files in the directory.
Space complexity: O(n)