Bloomberg LP Interview Question
Senior Software Development EngineersCountry: United States
Interview Type: In-Person
Data Structure: HashMap and a PriorityQueue.
Suppose Product ID is a String, then we need:
class Product{
String ID;
int cnt;
Product(String id, int c) {
ID = id;
cnt = c;
}
}
Map<String, Integer> map = new HashMap<>();
Queue<Product> minTen = new PriorityQueue<>((x, y) -> x.cnt - y.cnt);
void add(String product) {
int cnt = 0;
if(map.containsKey(product) {
map.put(product, cnt = map.get(product) + 1);
} else
map.put(product, cnt = 1);
if(minTen.size() == 10)
minTen.poll();
minTen.add(new Product(product, cnt));
}
List<String> top10() {
List<String> result = new ArrayList<>();
for(String id : minTen) {
result.add(id);
}
return result;
}
We assume that the each log file corresponds to a specific date and log files arrive in increasing order of dates. For example, log file for 16th April 2016 follows the one for 15th April 2016. In each log file, there are zero or more entries for a product.
A. In order to keep track of the entries seen for the last 30 days for each product, one of a min-heap, a BST, or a doubly-linked list ordered by date should be maintained for each product. Each node corresponds to a specific date and number of entries for the product seen on that date. If it is a BST, each node should also store the total number of entries in the subtree rooted at that node.
B. There should be a cache, that maintains the total count of all entries for each product in the 30 day period. This will not be necessary if a BST was used in A, since the root of the BST will store the total count anyway.
C. When a new log file comes in, a node with the date and count of entries should be added to data structure in A for each product that appears in the log file. The data structure in B should also be updated by adding this count to the entry for each product.
At the same time, for each product, nodes in A corresponding to dates are earlier than 30 days, should be removed. B should also be updated by subtracting the total count of removed nodes for each product. In the case of BST, the entire subtree rooted at the node for the earlier date can be removed and the total count at the root be updated by subtracting the value in that subtree.
Since the window is 30 days in length, all these operations are constant time O(n=30) or O(log(n=30)), irrespective of the data structure used.
D. To keep track of the top 10 products in the window, a min-heap of size 10 should be maintained, where each node corresponds to a unique product and whose value is the total number of entries for that product in the 30 day window. The heap is organized by this value. This heap should be updated every time the data structure in A (and optionally B) is updated.
Also, for each node that appears in D, there should be a reference to it from the corresponding entry in B (or null, if a product in B is not a top-10 product and hence does not appear in D), so that the node can in D can be efficiently located, removed and reinserted with updated value if necessary, for each product whose count is updated.
E. Also, a product may not appear in some log files within a 30 day period at all. If there are a large number of products, it will be inefficient to scan the data structure in A for each product to determine whether nodes exist corresponding to earlier dates.
This can be mitigated by using a cache where the key is the date and the entry is the list of data structures in B for products that appear in logs files for that date. This way, for example, when the log file for 1st May comes along, the data structures B for products that appeared on 1st April can be efficiently retrieved and updated.
This way, the min-heap in D will always contain the top 10 products that appeared in log files within the last 30 days.
//
//
// THERE ARE SEVERAL LOG FILES COMING BY DATE WITH PRODUCT IDS
// AND I NEED TO REPORT THE TOP 10 (PRODUCT IDS) DURING A MOVING
// PERIOD OF 1 MONTH. DISCUSSED ABOUT THE DATA STRUCTURES NEEDED
// TO IMPLEMENT THE SOLUTION.
//
// -- bhardwaj.cs February 23, 2016 in United States
// Bloomberg LP Interview Question
//
// Run with VM arguments -ea to enable assert testing
//
// (c) 2016 Perry Anderson, All Rights Reserved, worldwide.
//
//
import java.util.*;
/*
* From what I can glean from the problem description the question
* being asked is how I would get information from an inventory
* database and sort the results on two attributes, when the
* products were stocked and how much of the product was stocked.
*
*/
class Product implements Comparable<Product> {
String id;
int count;
Date stocked;
Product(String id, int count, Date stocked) {
this.id = id;
this.count = count;
this.stocked = stocked;
}
@Override
public int compareTo(Product o) {
if ( o.stocked == stocked )
return o.count - count;
else
return (int)(o.stocked.getTime() - stocked.getTime());
}
}
public class LPMain005 {
static Product[] getReport(Product[] products) {
PriorityQueue<Product> queue =
new PriorityQueue<Product>(10);
for ( Product product : products )
queue.add(product);
Product[] report = new Product[products.length];
for ( int i=0; i<products.length; i++ )
report[i] = queue.poll();
return report;
}
public static void main(String[] args) {
Calendar calender = Calendar.getInstance();
calender.add(Calendar.MONTH, -1);
Date lastMonth = calender.getTime();
Date thisMonth = new Date();
Product[] test1 = {
new Product("Bananas",5,lastMonth),
new Product("Blackberries",15,thisMonth),
new Product("Yellowberries",7,thisMonth),
new Product("Apples",10,lastMonth),
new Product("Limes",3,lastMonth),
new Product("Blueberries",20,thisMonth),
new Product("Redberries",12,thisMonth)
};
Product[] results1 = {
new Product("Apples",10,lastMonth),
new Product("Bananas",5,lastMonth),
new Product("Limes",3,lastMonth),
new Product("Blueberries",20,thisMonth),
new Product("Blackberries",15,thisMonth),
new Product("Redberries",12,thisMonth),
new Product("Yellowberries",7,thisMonth)
};
/*
* Due to a glitch in the JVM we have to compare the arrays
* using a manual approach;
*/
// assert Arrays.equals(getReport(test1), results1);
// assert !Arrays.equals(getReport(test1), test1);
Product[] report = getReport(test1);
for ( int i=0; i<test1.length; i++ )
if ( !report[i].id.equals(results1[i].id) )
System.out.println("Different");
}
}
//
//
// THERE ARE SEVERAL LOG FILES COMING BY DATE WITH PRODUCT IDS
// AND I NEED TO REPORT THE TOP 10 (PRODUCT IDS) DURING A MOVING
// PERIOD OF 1 MONTH. DISCUSSED ABOUT THE DATA STRUCTURES NEEDED
// TO IMPLEMENT THE SOLUTION.
//
// -- bhardwaj.cs February 23, 2016 in United States
// Bloomberg LP Interview Question
//
// Run with VM arguments -ea to enable assert testing
//
// (c) 2016 Perry Anderson, All Rights Reserved, worldwide.
//
//
import java.util.*;
/*
* From what I can glean from the problem description the question
* being asked is how I would get information from an inventory
* database and sort the results on two attributes, when the
* products were stocked and how much of the product was stocked.
*
*/
class Product implements Comparable<Product> {
String id;
int count;
Date stocked;
Product(String id, int count, Date stocked) {
this.id = id;
this.count = count;
this.stocked = stocked;
}
/*
* The logic required to meet the specification of this
* task can be resolved using the compareTo() method to
* simply sort the Product objects based on the date
* (represented by 'stocked' here) and by the actual
* count of the items of the Product in stock.
*
* (non-Javadoc)
* @see java.lang.Comparable#compareTo(java.lang.Object)
*/
@Override
public int compareTo(Product o) {
if ( o.stocked == stocked )
return o.count - count;
else
return (int)(o.stocked.getTime() - stocked.getTime());
}
}
public class LPMain005 {
/*
* This method merely makes use of the PriorityQueue<>
* container which carries out a sort internally which
* depends on the implementation of the compareTo()
* method of the Product class.
*
*/
static Product[] getReport(Product[] products) {
PriorityQueue<Product> queue =
new PriorityQueue<Product>(10);
for ( Product product : products )
queue.add(product);
Product[] report = new Product[products.length];
for ( int i=0; i<products.length; i++ )
report[i] = queue.poll();
return report;
}
public static void main(String[] args) {
Calendar calender = Calendar.getInstance();
calender.add(Calendar.MONTH, -1);
Date lastMonth = calender.getTime();
Date thisMonth = new Date();
Product[] test1 = {
new Product("Bananas",5,lastMonth),
new Product("Blackberries",15,thisMonth),
new Product("Yellowberries",7,thisMonth),
new Product("Apples",10,lastMonth),
new Product("Limes",3,lastMonth),
new Product("Blueberries",20,thisMonth),
new Product("Redberries",12,thisMonth)
};
Product[] results1 = {
new Product("Apples",10,lastMonth),
new Product("Bananas",5,lastMonth),
new Product("Limes",3,lastMonth),
new Product("Blueberries",20,thisMonth),
new Product("Blackberries",15,thisMonth),
new Product("Redberries",12,thisMonth),
new Product("Yellowberries",7,thisMonth)
};
/*
* Due to a glitch in the JVM we have to compare the arrays
* using a manual approach;
*/
// assert Arrays.equals(getReport(test1), results1);
// assert !Arrays.equals(getReport(test1), test1);
Product[] report = getReport(test1);
for ( int i=0; i<test1.length; i++ )
if ( !report[i].id.equals(results1[i].id) )
System.out.println("Different");
}
}
Data structure: hash table.
Traverse every entry in the log file. For each entry use the entry's ID as the hash table's key. And if the date corresponding to that ID is within the specified range, increment the value of that key by 1 (initially initialized to 0). This operation is O(n).
Afterward, use a generic O(nlogn) algorithm to sort the hash table by values (or use a max-heap tree, where each node contains the hashtable's key-value pair). Report the key of the top 10 result.
So this the time complexity should be O(n)+O(nlogn) = O(n)
You need a Most/Least Recently Used Cache(L/MRU Cache). By implementing this, you'll need a Hash Table to record id, and a priority queue to maintain the retrieve times as highest priority. Meanwhile, your data structure must have two methods, detach() and insert() at any position in the structure.
The class could look something like this.
class Node{
int id;
int retrieve_times;
Node prev, next;
}
And, detach() and insert()
void detach(Node n){
if n is head of queue
head=n.next;
else n.prev.next=n.next;
if n is end of queue
end=n.prev
else n.next=prev=n.prev
}
void insert(Node n){
n.next=head;
n.prev=null;
if head is not null
head.prev=n
head=n;
if end is null
end=n
}
If incoming id hasn't appeared in the queue, do the following
Add (id, node) to Hash Table;
Increment retrieve time (r_t);
Insert node(id, r_t) to queue;
else
Increment r_t;
Detach node from queue;
Re-insert node into queue;
Since, your data structure uses priority queue, it automatically sort r_t in descending order. So once you insert node(id, r_t), the queue will put it at the relevantly proper position. Now, extract the first 10 elements in queue, you will have your top 10!
You'll need a complex data structure, more likely a Most Used Cache.
To do this, you need to define a new class to simulate the cache, a hash table to record id, and a max heap to function as your actual cache.
Class looks like this:
class Node{
int id; //id
int r_t; //Retrieve times
Node prev, next; //previous and next node
}
The cache basically performs the following:
If incoming id hasn't appeared in heap (Use Hash Table to retrieve)
Add it to Hash Table;
Create new Node(id, r_t=1)
Insert node into heap;
else
Delete node from heap;
Increment r_t by retrieving Hash Table using key=id;
Re-insert node into heap;
Note: the insert(Node n) and delete(Node n) operations have to be considered properly performed at any position in heap.
Since max heap keeps r_t in descending order, once you update r_t and re-insert the node, the heap will automatically put it into right position. To accomplish this, you can choose either to design your own heap, or to use PriorityQueue in built-in library. After importing all incoming data stream into your heap, in which the first 10 elements is your top 10 data.
Data Structure: HashMap and a PriorityQueue.
Suppose Product ID is a String, then we need:
class Product{
String ID;
int cnt;
Product(String id, int c) {
ID = id;
cnt = c;
}
}
Map<String, Integer> map = new HashMap<>();
Queue<Product> minTen = new PriorityQueue<>((x, y) -> x.cnt - y.cnt);
void add(String product) {
int cnt = 0;
if(map.containsKey(product) {
map.put(product, cnt = map.get(product) + 1);
} else
map.put(product, cnt = 1);
if(minTen.size() == 10)
minTen.poll();
minTen.add(new Product(product, cnt));
}
List<String> top10() {
List<String> result = new ArrayList<>();
for(String id : minTen) {
result.add(id);
}
return result;
}
Data Structure: HashMap and a PriorityQueue.
Suppose Product ID is a String, then we need:
class Product{
String ID;
int cnt;
Product(String id, int c) {
ID = id;
cnt = c;
}
}
Map<String, Integer> map = new HashMap<>();
Queue<Product> minTen = new PriorityQueue<>((x, y) -> x.cnt - y.cnt);
void add(String product) {
int cnt = 0;
if(map.containsKey(product) {
map.put(product, cnt = map.get(product) + 1);
} else
map.put(product, cnt = 1);
if(minTen.size() == 10)
minTen.poll();
minTen.add(new Product(product, cnt));
}
List<String> top10() {
List<String> result = new ArrayList<>();
for(String id : minTen) {
result.add(id);
}
return result;
}
You need a windowing approach to solve this problem. Actually the use case has been picked straight from Apache Spark which does many such operations. For the sake of this question you need to create a Min Heap for each product id. The timestamp decides which element is the root. Whenever you do a read or insert operation you delete elements from the heap till the only elements that are left are less than a month old and then you do the insert or read operation.
- Abhi February 27, 2016