Google Interview Question
Software Engineer in TestsCountry: Switzerland
Interview Type: In-Person
I did this in Go by using a combination of a map and container/list, which implements a linked list to create an LRU cache. It's made threadsafe by embedding a sync.Mutex. Assuming "the 10 most recently requested images" means that we should bump up existing entries when they're requested again, every request needs to Lock() the mutex to avoid race issues. If, on the other hand, it's just the age of the images that matters, this can be converted to use a sync.RWMutex and only cache misses need to Lock().
For this example, I'm mocking up the image data with just a call to time.Now().UnixNano()
package main
import (
"container/list"
"fmt"
"sync"
"time"
)
const (
cacheSize = 10
)
type Image struct {
name string
data string
}
type ImageCache struct {
sync.Mutex
imgIdx map[string]*list.Element
imgList *list.List
}
func NewImageCache() *ImageCache {
return &ImageCache{
imgIdx: make(map[string]*list.Element),
imgList: list.New(),
}
}
func (ic *ImageCache) Request(imageName string) string {
ic.Lock()
defer ic.Unlock()
i, ok := ic.imgIdx[imageName]
//If we requested an image we don't have
if !ok {
//Have we hit the maximum size?
if ic.imgList.Len() >= cacheSize {
//remove the oldest entry from the list and map
delete(ic.imgIdx, ic.imgList.Remove(ic.imgList.Back()).(*Image).name)
}
newImage := &Image{
name: imageName,
data: fmt.Sprintf("%s:%d", imageName, time.Now().UnixNano()),
}
//push the new image to the front and update the map
ic.imgIdx[imageName] = ic.imgList.PushFront(newImage)
return newImage.data
} else {
//push the accessed image to the front
ic.imgList.MoveToFront(i)
return i.Value.(*Image).data
}
}
func main() {
//runtime.GOMAXPROCS(runtime.NumCPU())
ic := NewImageCache()
//wg := new(sync.WaitGroup)
for i := 0; i <= cacheSize; i++ {
// wg.Add(1)
// go func(i int) {
fmt.Println(ic.Request(fmt.Sprintf("image%d", i)))
// wg.Done()
// }(i)
}
//wg.Wait()
fmt.Println("---")
for i := cacheSize; i >= 0; i-- {
fmt.Println(ic.Request(fmt.Sprintf("image%d", i)))
}
}
public class ImageCacheRetriever {
private static final int CACHE_MAX_SIZE = 10;
ImageCache cache = new ImageCache(CACHE_MAX_SIZE);
public Image retrieve(final long id) {
Image cacheImage = cache.getImage(id);
if (cacheImage == null) {
final Image diskImage = getFromDisk(id);
new Thread( new Runnable() {
@Override
public void run() {
cache.put(id, diskImage);
}
}).run();
return diskImage;
}
return cacheImage;
}
private Image getFromDisk(long id) {
return new Image();
}
}
class ImageCache {
private ImageCacheEntry[] cache;
int cacheSize = 0;
int cacheHead = 0;
private int cacheMaxSize;
Object monitor = new Object();
public ImageCache(int cacheMaxSize) {
this.cacheMaxSize = cacheMaxSize;
cache = new ImageCacheEntry[cacheMaxSize];
}
public Image getImage(long id) {
synchronized (monitor) {
for (int i = 0; i < cacheSize; i++) {
if (cache[i].id == id) {
return cache[i].data;
}
}
return null;
}
}
public void put(long id, Image data) {
synchronized (monitor) {
ImageCacheEntry newEntry = new ImageCacheEntry(id, data);
if (cacheSize < cacheMaxSize) {
cache[cacheSize] = newEntry;
} else {
cache[cacheHead] = newEntry;
cacheHead = (cacheHead + 1) % cacheMaxSize;
}
}
}
public static class ImageCacheEntry {
protected long id;
protected Image data;
public ImageCacheEntry() {
id = -1;
}
public ImageCacheEntry(long id, Image data) {
this.id = id;
this.data = data;
}
}
}
how do you store users and their 10 recent requests in an efficient manner, also isn't requests for each individual user in which case you don't need monitors
The implemented cache does not hold the most-recent requests. A proper most-recently used cache should implement a priority queue where the removed item is the least recently used.
public class ImageCacheRetriever {
private static final int CACHE_MAX_SIZE = 10;
ImageCache cache = new ImageCache(CACHE_MAX_SIZE);
public Image retrieve(final long id) {
Image cacheImage = cache.getImage(id);
if (cacheImage == null) {
final Image diskImage = getFromDisk(id);
new Thread( new Runnable() {
@Override
public void run() {
cache.put(id, diskImage);
}
}).run();
return diskImage;
}
return cacheImage;
}
private Image getFromDisk(long id) {
return new Image();
}
}
class ImageCache {
private ImageCacheEntry[] cache;
int cacheSize = 0;
int cacheHead = 0;
private int cacheMaxSize;
Object monitor = new Object();
public ImageCache(int cacheMaxSize) {
this.cacheMaxSize = cacheMaxSize;
cache = new ImageCacheEntry[cacheMaxSize];
}
public Image getImage(long id) {
synchronized (monitor) {
for (int i = 0; i < cacheSize; i++) {
if (cache[i].id == id) {
return cache[i].data;
}
}
return null;
}
}
public void put(long id, Image data) {
synchronized (monitor) {
ImageCacheEntry newEntry = new ImageCacheEntry(id, data);
if (cacheSize < cacheMaxSize) {
cache[cacheSize] = newEntry;
} else {
cache[cacheHead] = newEntry;
cacheHead = (cacheHead + 1) % cacheMaxSize;
}
}
}
public static class ImageCacheEntry {
protected long id;
protected Image data;
public ImageCacheEntry() {
id = -1;
}
public ImageCacheEntry(long id, Image data) {
this.id = id;
this.data = data;
}
}
}
The images are SHA'ed and their long ids are obtained, each user has a unique user id which is stored in dictionary order. Each user has a hash table of 10 recent image requests <Integer, Long ids>. If excess, the least recent one is deleted. This is just image retrieval hence I don't think synchronization is necessary. Please correct me if I am wrong.
Assuming each image is queried by it's path, I would use two data structures -
1. HashMap<String(path), Blob(data)>
2. DoublyLinkedList<String(path)>
Explanation:
When a request for an image path is received, check in Hashmap -
if the path already exists
(a) remove the path from Doubly linked list and put it in the beginning
(b) return blob data of the image
if the path doesn't exist in HashMap
(a) fetch the image data
(b) put <path, blob_data_of_the_image> in HashMap
(c) remove the last element from Doubly linked list
(d) insert path in the beginning of Doubly linked list
(e) return blob data
Some brain storm. The challenge is to support multi-threading. Besides lock, I am thinking about a lock free design.
1. use below data structure:
a. a hashmap (let's call it image map) whose key is the image name and the value is the double linked list node. Assumption: each image has different name so names can be used as hash key.
b. a double linked list (let's call it position list) stores the image name and the image index in the online image array.
c. two image arrays. Any time one array is online serving the requests and the other is offline processing the write operations (e.g. image adding/deletion).
d. an image write operation buffer. Just like a queue. All image write operation is put in this buffer.
e. an image write thread responsible for handling the write request in the buffer and update the offline image array. After updating, this thread put the original online array offline and the original offline array online. This design ensures the high availability - the cache always responds the image read request.
The image map and position list is immutable. Each request thread will clone and maintain its own copy of image map and position list. These two pieces of data are small so it is ok to clone repeatedly.
The image array could be big so it is shared. But there is no lock for it because: (1) we use two rotating image array for writing. One thread is handling these requests in a serialized way. (2) there could be chances that a thread tries to read an image which has been replaced/deleted in the online image array. This is a cache miss. It is ok to return "not found" by comparing the image name from the position list node with that from the image array. It should never return a wrong image. In this case, the request thread should be able to look up the online image array to check if the wanted image is in the cache but just in a different slot. If not, send an image adding request to the write buffer and return.
For all image write operations, the request threads just fire and forget.
This is a start and a lot of more details to add. For example, how the request thread update the hash table and position list if the image is deleted or changing the image array slot. How to reduce cache miss.
#written code in python using deque.
from collections import deque
class Cache10Images(object):
def __init__(self):
self.image = []
self.imagelist = deque(self.image)
def append(self, new_image):
self.imagelist.appendleft(new_image)
if len(self.imagelist) >= 10: # as 0 is the index
self.imagelist.pop()
def get(self, image_at_position):
return self.imagelist[image_at_position]
Use a LinkedListHashMap to keep a list of keys ordered by insertion.
Every time you reach the limit (10 in this case), you remove the first inserted key (older one)
- Felipe Cerqueira June 28, 2015