CareerCup

Interview Question for Principal Software Engineers

data-structures-interview-questions

0

of 0 votes

8
Answers

I was asked this question for the above role:
Create a fixed size cache which is fully associative. The entries are evicted based on the rank. for any entry added, the function int getEntryRank(entry) will return its rank which will not change on lookup
db_read_entry() to get the entry from db

Part 2) The rank will change on lookup

My solution was:
a hashtable (unordered_set<entry>)

and a priority queue <pair<int, entry> >

lookup(entry) {
   hashtable lookup. 
  if found, return the entry (O(1))

  // otherwise (not found)
    if limit reached, then
      // evict the priority queue top,
  
    entry e = db_read_entry(); // expensive op
    //insert the entry into the hash set, 
    //insert pair<int, entry>(get_entry_rank(entry), entry)
          into the priority queue.  (O(logn))
    return entry;
} 

Part 2: The rank change on lookup
   Since the priority queue does not provide the key(hash key)
   value to be modified, and only provide access to top entry. 
  I propose change in the associated ds for hashset, 

   My proposal was: create an associated binary tree(std::set) with the hashmap (map of entry as key, and the iterator entry in the std::set of entries) . I used iterator to avoid look up for entry into the set during each entry lookup in the hashset. 
The set contains the [rank information + entry]. On look up, the entry into the set(of rank) is looked up (O(1)) and then erased, and then insert back after recaulated rank value.The iterator inrto the hash_set is also updated with this new iterator. 

On limit reached,  the tree is searched for min value. from that min value item (the rank and the entry), the entry is looked up into the hash_set, and hence removed from there too. 
The rest of the process is the same as lookup as described below. 

I believed my slution was faor enough 
O(1) look up, 
if not found. the look up (with const rank) O(logn) + db access time. db access time dominate the O(logn) hence this logn (to insert into the priority queue,(or pop and insert into the priority) does not matter

For changing rank:
The binary tree lookup (O(1) since the hash_map has the iterator
update of the rank, (lookup and removal of the rank entry in the binary tree O(1), and insert is O(logn). 
Hence each operator (lookup, evict and lookup, excluding the db access) will take O(logn) instead of O(1).

I was rejected because of not optimize solution (performance is not good). I am wondering whether the solution was not good even though it was phone interview and I have to work within the time frame of 25 minutes. or the interview process is just unrealistic. 

Please share your solution so that I can see where I failed

- ali.kheam July 28, 2017 in United States | Report Duplicate | Flag |
Principal Software Engineer Data Structures

Email me when people comment.

Email me when people comment.

Loading...

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

Email me when people comment.

Loading...

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

1

of 1 vote

I try to rephrase what I understood:
- create an associative cache (a lookup, key value store, ...)
- in the cache there are items, I assume some objects with attributes (e.g. an Item)
- items have a "rank" which I assume, high ranked items should be kept in the cache over low ranked items
- the cache has a size limit, e.g. maxSize_
- get(key) method: get's an element from the cache or nullptr if not there
- put(item) method: places the element in the cache and if cache would grow over max_size removes the lowest ranked element (which might be the one just added)

Assumptions:
- there are no two element with the same rank, if so one can evict one of the two elements if their rank score is the lowest
- rank is an integer value

in your example there is a db_read_entry(key) method, which I assume is from a data access component. I would leave DB aspects out since it is not relevant for the cache functionality

Solution for the first question I would solve similar to you. get is trivial put would be:

void put(const shared_ptr<Item>& item) {
		if (ht_.find(item->getId()) == ht_.end()) {
			ht_.emplace(item->getId(), item);
			queue_.emplace(pair<int, int>(item->getRank(), item->getId()));
		}

		// clean up queue
		if (queue_.size() > maxSize_) {
			auto qitem = queue_.top(); // get top element
			queue_.pop(); // remove top element
			ht_.erase(qitem.second); // remove by id
		}
	}

the only thing to mention is that it might be slightly stupid if the elment being added has lower rank than top of the queue because it is added and then removed again.

For the follow up the question it is a bit tricky, because you want to actually change the priority of an item in the queue. Most standard containers have a problem with this because you have no random access to the item.

One way around it can be to just add another item to the queue with the new priority. Since you need to keep the HT size constant, the queue will eventually be cleaned, but there is a risk that the queue is significantly larger than the HT. It is not optimal.

If you implement a binary heap your self, you can solve this issue and place pointers to the binary heap elements into the HT as values. If you change the priority of an item, you just bubble it up or down in the heap and you will get to the element by O(1) coming from the HT. I once implemented such a binary heap as an exercise. I maintained the index into the heap-vector on the "iterators" pointing to the heap elements. That turned out to be quite optimal. However, I never found anybody doing it the same way. E.g. when you look at Dijkstra implementations people usually just add queue items with lower priority and accept the queue growing larger than needed (I think it's not relevant).

your statement "create an associated binary tree(std::set) with the hashmap(map of entry as key, and the iterator entry in the std::set of entries)" I didn't understand.
Do you mean a hashtable with the item-id as key and the tree-iterator as value. In the tree (you will need a map, not a set) you would have the rank as keys and the item as values"?

like:
unordered_map<int, map<int, int>::iterator>> item_id_to_treeItem;
map<int, shared_ptr<item>> rank_to_item;

This works as well, you take the tree's min-element, from there you get the items-id, then you can remove it from the HT. This solution has the same O(lg(n)) properties for the get() and put() as the heap due to maintenance of tree, but the tree needs to rebalance etc. which is heavier on constants than binary heap that can be implemented in a vector which is much heap- and cachefriendlier than trees.

- Chris July 28, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

0

of 0 vote

@CrisK, I guess [1] stays what it is.
[2] is pretty simple, and annoyingly popular question :
[ careercup.com/question?id=14113740 ]
Implement LRU cache?

- NoOne July 29, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

0

of 0 vote

@NoOne: LRU is significantly easier, because you needn't maintain a sorted order. Here you have a ranking function that gives you an arbitrary order. So, you can't do it with a HT poitinig to linked list elements except you accept O(n) time to change to position of the element that gets hit and receives a new arbitrary rank value.
With LRU, the rank value would always be the biggest, so you can maintain the order without comparing anything...
It's similar but not the same. The question is actually therefore a bit less annoying ;-)

- Chris July 29, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

0

of 0 vote

@ChrisK, for part 2 I think that custom implementation of a binary heap with pointers to bubble up elements when rank changes would be the most efficient way of handling it, although I definitely don't think this would've been able to be implemented over a 30 mins phone interview.

I can also think of a solution using a Treap instead of a binary heap where you would use the entry key as the node's key and the entry's rank as the node's priority, which would allow us to do all the same operations as the heap but it would take O(2ln(n)) to update the rank of an entry by looking up a node by key and removing it, and then reinserting it with a new rank.

- funk July 29, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

0

of 0 vote

@ChrisK, ok now I get it. They want to generalise Rank :
[ people.csail.mit.edu/sanchez/papers/2016.model.hpca.pdf ] something like this.
Given a generic function, that can shuffle the order *completely* - I rather doubt there is any point - unless we specify what sort of property the ranking function has.

- NoOne July 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

0

of 0 vote

@NoOne: I don't understand your last statement, what do you need to specify on the ranking function properties? It's a ranking function, it will return a rank value and it is told that the rank value of a single item changes on get (not all, no complete re-shuffling). We can assume it's an integer or what ever, at the end what we need is top k of the rank values and it's associated items in the cache (the same rank value discussion is a detail).

- Chris July 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

0

of 0 vote

I apologize for the confusion
The first case is simple, I was asked to implement the cache using fixed ranking for cache entries.

The second case is complicated. When the entry is looked up, the getEntryRank is called to get the updated rank value. If its a LRU cache, we assume that the new rank will be minimum of all the existing cache entries. For this case, we can go about using a hashtable with value as pointer to this entry in a sorted list. The back/front(your choice) of the list will be LRU entriy for eviction. Hence, the operation for look up will be O(1), O(1) hashtable lookup, and O(1) to move the entry for current to front of the list. This move involves erasing the entry from list and move to front.

But, I was specifically told that the getEntryRank can return any value. The new rank value can be lower or higher than its existing rank value. Thus maintaining the ordered list by ranking will result in O(n) look up operating as we have to move the entry iteratively to right spot in the list. I proposed a binary tree associated data structure instead of the list. Here the binary tree look up (keep a pointer to entry in the binary tree in the hast_table, this mean O(1) look up in the binary tree, erase the entry, get the new rank and insert back into the binary tree, thus the lookup will cost O(logn))
Can you work out O(1) cost for this lookup if the new rank is random.

- ali.kheam July 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

0

of 0 vote

@ali.kheam:
my thoughts:
- There is no method that sorts in O(k), given k is the fixed cache size except variations of radix sort (if the values have limited range)
- If k is e.g. 3 a hashtable is overkil, maybe that was the intention
- If the ranking function has a low range (e.g. 0-254) you could construct a radix sort based DS with nested hashtables etc. It's possible but it's not very nice... that wouold have O(1) on all operations, so maybe you should have asked what is the range of the ranking function and what is the size of the cache, is it as well constant, etc...

- Chris August 01, 2017 | Flag Reply

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.