## Interview Question

Principal Software Engineers**Country:**United States

**Interview Type:**Phone Interview

@NoOne: LRU is significantly easier, because you needn't maintain a sorted order. Here you have a ranking function that gives you an arbitrary order. So, you can't do it with a HT poitinig to linked list elements except you accept O(n) time to change to position of the element that gets hit and receives a new arbitrary rank value.

With LRU, the rank value would always be the biggest, so you can maintain the order without comparing anything...

It's similar but not the same. The question is actually therefore a bit less annoying ;-)

@ChrisK, for part 2 I think that custom implementation of a binary heap with pointers to bubble up elements when rank changes would be the most efficient way of handling it, although I definitely don't think this would've been able to be implemented over a 30 mins phone interview.

I can also think of a solution using a Treap instead of a binary heap where you would use the entry key as the node's key and the entry's rank as the node's priority, which would allow us to do all the same operations as the heap but it would take O(2ln(n)) to update the rank of an entry by looking up a node by key and removing it, and then reinserting it with a new rank.

@ChrisK, ok now I get it. They want to generalise Rank :

[ people.csail.mit.edu/sanchez/papers/2016.model.hpca.pdf ] something like this.

Given a generic function, that can shuffle the order *completely* - I rather doubt there is any point - unless we specify what sort of property the ranking function has.

@NoOne: I don't understand your last statement, what do you need to specify on the ranking function properties? It's a ranking function, it will return a rank value and it is told that the rank value of a single item changes on get (not all, no complete re-shuffling). We can assume it's an integer or what ever, at the end what we need is top k of the rank values and it's associated items in the cache (the same rank value discussion is a detail).

I apologize for the confusion

The first case is simple, I was asked to implement the cache using fixed ranking for cache entries.

The second case is complicated. When the entry is looked up, the getEntryRank is called to get the updated rank value. If its a LRU cache, we assume that the new rank will be minimum of all the existing cache entries. For this case, we can go about using a hashtable with value as pointer to this entry in a sorted list. The back/front(your choice) of the list will be LRU entriy for eviction. Hence, the operation for look up will be O(1), O(1) hashtable lookup, and O(1) to move the entry for current to front of the list. This move involves erasing the entry from list and move to front.

But, I was specifically told that the getEntryRank can return any value. The new rank value can be lower or higher than its existing rank value. Thus maintaining the ordered list by ranking will result in O(n) look up operating as we have to move the entry iteratively to right spot in the list. I proposed a binary tree associated data structure instead of the list. Here the binary tree look up (keep a pointer to entry in the binary tree in the hast_table, this mean O(1) look up in the binary tree, erase the entry, get the new rank and insert back into the binary tree, thus the lookup will cost O(logn))

Can you work out O(1) cost for this lookup if the new rank is random.

@ali.kheam:

my thoughts:

- There is no method that sorts in O(k), given k is the fixed cache size except variations of radix sort (if the values have limited range)

- If k is e.g. 3 a hashtable is overkil, maybe that was the intention

- If the ranking function has a low range (e.g. 0-254) you could construct a radix sort based DS with nested hashtables etc. It's possible but it's not very nice... that wouold have O(1) on all operations, so maybe you should have asked what is the range of the ranking function and what is the size of the cache, is it as well constant, etc...

I try to rephrase what I understood:

- create an associative cache (a lookup, key value store, ...)

- in the cache there are items, I assume some objects with attributes (e.g. an Item)

- items have a "rank" which I assume, high ranked items should be kept in the cache over low ranked items

- the cache has a size limit, e.g. maxSize_

- get(key) method: get's an element from the cache or nullptr if not there

- put(item) method: places the element in the cache and if cache would grow over max_size removes the lowest ranked element (which might be the one just added)

Assumptions:

- there are no two element with the same rank, if so one can evict one of the two elements if their rank score is the lowest

- rank is an integer value

in your example there is a db_read_entry(key) method, which I assume is from a data access component. I would leave DB aspects out since it is not relevant for the cache functionality

Solution for the first question I would solve similar to you. get is trivial put would be:

the only thing to mention is that it might be slightly stupid if the elment being added has lower rank than top of the queue because it is added and then removed again.

- Chris July 28, 2017For the follow up the question it is a bit tricky, because you want to actually change the priority of an item in the queue. Most standard containers have a problem with this because you have no random access to the item.

One way around it can be to just add another item to the queue with the new priority. Since you need to keep the HT size constant, the queue will eventually be cleaned, but there is a risk that the queue is significantly larger than the HT. It is not optimal.

If you implement a binary heap your self, you can solve this issue and place pointers to the binary heap elements into the HT as values. If you change the priority of an item, you just bubble it up or down in the heap and you will get to the element by O(1) coming from the HT. I once implemented such a binary heap as an exercise. I maintained the index into the heap-vector on the "iterators" pointing to the heap elements. That turned out to be quite optimal. However, I never found anybody doing it the same way. E.g. when you look at Dijkstra implementations people usually just add queue items with lower priority and accept the queue growing larger than needed (I think it's not relevant).

your statement "create an associated binary tree(std::set) with the hashmap(map of entry as key, and the iterator entry in the std::set of entries)" I didn't understand.

Do you mean a hashtable with the item-id as key and the tree-iterator as value. In the tree (you will need a map, not a set) you would have the rank as keys and the item as values"?

like:

unordered_map<int, map<int, int>::iterator>> item_id_to_treeItem;

map<int, shared_ptr<item>> rank_to_item;

This works as well, you take the tree's min-element, from there you get the items-id, then you can remove it from the HT. This solution has the same O(lg(n)) properties for the get() and put() as the heap due to maintenance of tree, but the tree needs to rebalance etc. which is heavier on constants than binary heap that can be implemented in a vector which is much heap- and cachefriendlier than trees.