Amazon Interview Question
Development Support Engineersfirst of all we dont be able to load 50G of log file into memory. you could only load a portion of log file and check if that portion contains the log corresponding to 3 to 5 PM.
A log file with events will always be sorted according to time stamp in increasing order of only a single thread is writing to log file.
Agree with tarun, however, it needs to be optimized. 50G of file may be on one disk or may spawn multiple disks based on the file structure. Also, reading each line means accessing disk so many times and will slow down the process. Assuming the program is running on a 4G RAM machine and assuming 2GB of RAM is available for the program, this can be completed in 25 disk accesses. However, one should also consider that this is not a guaranteed performance. Reading 2GB at a time may not read just 2GB in one shot. You may not have enough space in your stream buffers. You may not have enough paging memory left for other programs and this may altogether bring the system down. The solution is a best approximation of all parameters.
Each record contains a timestamp , message type and message body . I am assuming that each record is of fixed size.
Based on this assumption, i have a better solution.
Since the timestamp is going to be increasing order , we can do a mechanism similar to a binary search by using file pointers using functions like fseek(). Since we know how many bytes each record is taking we can seek the pointer to the middle of the file. For getting the file size , we can use fstat() function.
Amazon is famous for asking such questions, the obvious answer is some grep or awk command, they dont expect you to try writing a program for something like this.
Read this - sites.google.com/site/steveyegge2/five-essential-phone-screen-questions
1. take each line -> split it up with a delimiter and put it in a string array of 3[timestamp messagetype and message body ]
- Tarun April 24, 20102. for each iteration check for index 1 for timestamp>3 pm and timestamp<5 pm
if true then move inside the loop
3. inside the loop compare the index 2 == with FATAL[do a findsubstring] , if matches then return the contents of this string array else continue with loop.
Space : 1 String Array with 3 indexes [we will overright each time]
Time Complexity : have to traverse all the log file hence O(N) atleast