Amazon Interview Question for Java Developers


Country: United States
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
2
of 2 vote

Maintain a hash table with the following structure.

1. Use a Pair object with (line#, count) elements
2. Key of hash table would be the hashcode(something like MD5) of the string and value would be an arraylist of Pair objects.
[hashCodeOfString]-> ArrayList<Pair>()

For the first file, for each of the lines, compute the hashcode for key.. As it's value, store the corresponding line# and the initial count set to 1.

For the second file, for each of the lines, compute the hash code and if the hashcode matches, from the Pair object, match the current line with the corresponding lines from the 1st file. If the string comparison also matches increment the count field of the pair object.

Iterate over the hash table and print those lines with count > 1

- Murali Mohan June 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 2 votes

I don't think you'd need that pair object. It only asks to print the lines and not for the line numbers or any extra data.

Iterate over the 1st file and add the files to the hashmap
Iterate over the 2nd file - on each line: if there is a value in the hashmap print that line. If there are duplicate lines in the files and you only want one instance of each common line you could remove the value from the hashmap after you print it.

- Mrs. Jumbo June 14, 2013 | Flag
Comment hidden because of low score. Click to expand.
1
of 1 vote

@Mrs Jumbo. If you compute hash code for a billion strings, the likelihood of collisions becomes very high. I therefore thought of using the line numbers to keep track of which all lines are hashing to the same bucket.

While processing the second file a line may hash to a bucket, but it need not be a string that exactly matches lines from the 1st file. Therefore, based on the line #s, we need to do exact string matching of those lines from the first file and if they match, increment the counter of the matching line.

- Murali Mohan June 15, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

one billion objects in memory is too much.... what if there are 100 billion lines.
do external sort and compare

- anonymous June 16, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

The approach by Erasmus will not work due to running out of memory. MD5 hash size is 15 bytes so you will need 15 Gbytes to store hashes for sentences in the first file.
As the author above states, you perform an external sort in o(nlogn), and then do a comparison which is o(n).

- roberto.triviani November 07, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

this can be a variation of LCS where we have to dealt with strings instead of characters. LCS can be solved using DP.

Correct me if I am wrong.

LCS is the algorithm for

diff

command in unix

- imrhk June 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

This can be solved using LCS algorithm but, the constraint is we have a 2 files of billion lines each and LCS has O(n^2) space as well as time complexity. This approach is hence not advisable for this. Better use hashTables.

- sarathsomana June 15, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

In java we need to use RandomAccessFile and use to read line #readLine

MapReduce design
=====================================================================
1. One Map function reads line for 1st file line by line, say for example 1000 lines (depends on memory and speed available ) at one shot
2. Another map function reads another file (2nd file) line by line, say for example 1000 lines (depends on memory and speed available ) at one shot and look for line in first file. If matches then increase the count
3. Reduce function takes those hash table keys and those which have >0 value remove the key from map and print the common line (if lines are so big volumne then save it in external storage using another reduce function)

It can be implemented in thread model or in distributed way or in parallel way.
and maintain a HashTable for each line

- Debasis Jana June 15, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Map function process file(s) iteratively n (may be 1000 or more than that) lines.

- Debasis Jana June 15, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 2 vote

how about Trie, since each line can not be very large, say length is 100, the ASCII string is 256, the trie construction can be O(nlogn), find also O(nlogn).
If with hashtable, the constuction can be n, however, different string can have same hashcode, which means if the hashcode is the same, we also need to compare the string as well. But if ingore the hashcode same issue, it is O(n)

- roger wang June 15, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 2 vote

Since both the files have billion lines, I assume they are stored in the secondary memory.

I would initially do an external merge sort on both the files. Note that this will be lexicographic sorting.

Next, I'll match each line from file 1 with file 2. The matching will now be efficient because both the files are sorted. since you can't keep both the files in the main memory, you'll have to do matching in chunks.

Needless to say this will run in O(nlogn) time, but it is guaranteed to work for big files.

- Epic_coder June 16, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Nice idea. If there are no duplicate lines in each file, I think you can even merge file1 and file2 into file3 and check if two lines are the same. About time complexity I think you will have to factor in the average length of a line of both the files as well. If it were m, time complexity would be O(mnlgmn)

- Murali Mohan June 17, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

We can store initially one complete into map as line as the key, now start with second file, try to insert into map, if it fails, print the repeated line and remove it from the map. Even if that line inserted into map, delete that entry since we can reduce the map size. So finally Map will have non duplicate entries from both the file. Because there is a chances that Line1 in file1 matches with Last line in file2. If this is the case, we want to store atleast 1 map. If there is any issue in retrieving the string, due to hashing....we can use double hashing.

- prashanthreddy.mtech June 17, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

but what about the size of hashmap?How about considering the collisions?

- sibendu dey June 29, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Hi ,
This solution will be suitable for this question.

public class ReadCommonLinesFromFile {
	
	public static void main(String args[]) throws IOException{
		File f1=new File("E:\\Text1.txt");
		File f2=new File("E:\\Text2.txt");
		String name=null;
		
		BufferedReader buffReader1=new BufferedReader(new FileReader(f1));
		BufferedReader buffReader2=new BufferedReader(new FileReader(f2));
		
		Set<String> set1=new HashSet<String>();
		Set<String> set2=new HashSet<String>();
		
		while(( name=buffReader1.readLine())!=null){
			set1.add(name);
		}
		while(( name=buffReader2.readLine())!=null){
			set2.add(name);
		}
		
set1.retainAll(set2);
		
		System.out.println(set1);
		
	}

}

I hope this will helpful

- Chandarasekaran M April 16, 2014 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More