Amazon Interview Question for SDE1s


Team: Advertising
Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
1
of 1 vote

Implementation of LRU (least recently used) cache

package cache;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

public final class LRUCache {
	private final Map<String, List<Long>> map;
	private final int capacity;
	private final float loadFactor = 0.75f;

	public LRUCache(int capacity) {
		this.capacity = capacity;
		map = Collections
				.synchronizedMap(new LinkedHashMap<String, List<Long>>(
						this.capacity, loadFactor, true) {
					private static final long serialVersionUID = 1L;

					@Override
					protected boolean removeEldestEntry(
							Map.Entry<String, List<Long>> eldest) {
						return size() > capacity;
					}
				});
	}

	public List<Long> get(String key) {
		return map.get(key);
	}

	public void put(String key, List<Long> value) {
		map.put(key, value);
	}

	public void remove(String key) {
		map.remove(key);
	}

	public Collection<Map.Entry<String, List<Long>>> getValues() {
		return map.entrySet();
	}
}

Just kept the size of cache small for the simplicity. But over time, there would be more readers then writers to the cache. If we have multiple readers to read the cache at the same time (as long as allowed by locking process), then the performance of cache would increase. Therefore, we can either choose to use ConcurrentHashMap (lock-striping) or ReentrantReadWriteLock as coded below.

package cache;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

public final class EfficientLRUCache {
	private final Map<String, List<Long>> map;
	private final int capacity;
	private final float loadFactor = 0.75f;
	private final ReadWriteLock lock = new ReentrantReadWriteLock();
	private final Lock readLock = lock.readLock();
	private final Lock writeLock = lock.writeLock();

	public EfficientLRUCache(int capacity) {
		this.capacity = capacity;
		map = new LinkedHashMap<String, List<Long>>(
						this.capacity, loadFactor, true) {
					private static final long serialVersionUID = 1L;

					@Override
					protected boolean removeEldestEntry(
							Map.Entry<String, List<Long>> eldest) {
						return size() > capacity;
					}
				};
	}

	public List<Long> get(String key) {
		readLock.lock();
		try {
			return map.get(key);
		} finally {
			readLock.unlock();
		}
	}

	public void put(String key, List<Long> value) {
		writeLock.lock();
		try {
			map.put(key, value);
		} finally {
			writeLock.unlock();
		}
	}

	public void remove(String key) {
		writeLock.lock();
		try {
			map.remove(key);
		} finally {
			writeLock.unlock();
		}
	}

	public Collection<Map.Entry<String, List<Long>>> getValues() {
		readLock.lock();
		try {
			return map.entrySet();
		} finally {
			readLock.unlock();
		}
	}
}

However, if we choose to use ConcurrentHashMap, we lose LRU capacility that comes handly with LinkedHashMap.

- Hope September 26, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

do you think the list held in the cache could be a partial result for the query? For example, on a query, 10 ad IDs are pulled from the cache, but 5 more are pulled from the disk/DB? Or is it all or nothing? Either the entire list is in the cache or none at all?

- interesting problem September 29, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Seem like one issue is sharing the same set of ad IDs across keys (queries). If the intersection of the lists across all the queries is a large set, then reducing duplication would be important. So if you maintain a list for each query...if it is a list<long> then you could have a large amount of duplicate ad IDs on each list. If it is a list<long*> then you don't save space since each pointer is 64 bits pointing to a 64 bit long. If it was a list<short> then each short would be a key into a map that contained the associated 64 bit ad ID.

- sharing across queries September 29, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Yes, definitely. It could be partial. Depends on the contraints/requirements of your application, there might be cases that a search query may return huge number of ad IDs. In order not to fill up in-memory cache quickly, we use pagination. By using pagination, two other parameters are by default introduced to system. (limit and offset [naming can be different]) [limit: # of elements that returns from query at a time (ad ID's in this case), offset: from where you want to continue to search]

Having said that, lets talk about it on an example now.
<HOST>/search?q="Comedy"&offset=0&limit=25 => this will return the first 25 ad IDs from this search query
<HOST>/search?q="Comedy"&offset=25&limit=25 => this will return next 25 ad IDs for the same search query.
(Think about offset and limit are automatically populated by UI)

These two queries will stay in the cache as separate entries. Whenver the same search query is requested, the same result will be delivered.

Question comes when the result of search query will be changed in the cache in case it has been accessed all the time; in other words LRU hasn't discarded it for a long time.

To cover this scenario, I'd add TTL (Time to Live) feature as well to the above solution that I provided earlier. A single TTL can be used globally or TTL per search query entry. Let's think about using global TTL. Let's say we want our customers to see new ads that are added to system right away. We can keep created time per search query entry in the cache and after certain TTL we can easily invalidate the search result and remove it from the cache. [removeEldestEntry is good place to do this] When the same search query is requested again, the new result of the query will be populated into cache and it will be used for the period of next TTL.

- Hope September 29, 2014 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More