THERE ARE SEVERAL LOG FILES CO

Bloomberg LP Interview Question for Senior Software Development Engineers

0

of 0 votes

10
Answers
THERE ARE SEVERAL LOG FILES COMING BY DATE WITH PRODUCT IDS AND I NEED TO REPORT THE TOP 10 (PRODUCT IDS) DURING A MOVING PERIOD OF 1 MONTH. DISCUSSED ABOUT THE DATA STRUCTURES NEEDED TO IMPLEMENT THE SOLUTION.
- bhardwaj.cs February 23, 2016 in United States | Report Duplicate | Flag | PURGE
Bloomberg LP Senior Software Development Engineer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

You need a windowing approach to solve this problem. Actually the use case has been picked straight from Apache Spark which does many such operations. For the sake of this question you need to create a Min Heap for each product id. The timestamp decides which element is the root. Whenever you do a read or insert operation you delete elements from the heap till the only elements that are left are less than a month old and then you do the insert or read operation.

- Abhi February 27, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

Data Structure: HashMap and a PriorityQueue.

Suppose Product ID is a String, then we need:

class Product{
	String ID;
	int cnt;
	Product(String id, int c) {
		ID = id;
		cnt = c;
	}
}

Map<String, Integer> map = new HashMap<>();
Queue<Product> minTen = new PriorityQueue<>((x, y) -> x.cnt - y.cnt);

void add(String product) {
	int cnt = 0;
	if(map.containsKey(product) {
		map.put(product, cnt = map.get(product) + 1);
	} else 
		map.put(product, cnt = 1);
	if(minTen.size() == 10)
		minTen.poll();
	minTen.add(new Product(product, cnt));
}

List<String> top10() {
	List<String> result = new ArrayList<>();
	for(String id : minTen) {
		result.add(id);
	}
	return result;
}

- Chao March 02, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

We assume that the each log file corresponds to a specific date and log files arrive in increasing order of dates. For example, log file for 16th April 2016 follows the one for 15th April 2016. In each log file, there are zero or more entries for a product.

A. In order to keep track of the entries seen for the last 30 days for each product, one of a min-heap, a BST, or a doubly-linked list ordered by date should be maintained for each product. Each node corresponds to a specific date and number of entries for the product seen on that date. If it is a BST, each node should also store the total number of entries in the subtree rooted at that node.

B. There should be a cache, that maintains the total count of all entries for each product in the 30 day period. This will not be necessary if a BST was used in A, since the root of the BST will store the total count anyway.

C. When a new log file comes in, a node with the date and count of entries should be added to data structure in A for each product that appears in the log file. The data structure in B should also be updated by adding this count to the entry for each product.

At the same time, for each product, nodes in A corresponding to dates are earlier than 30 days, should be removed. B should also be updated by subtracting the total count of removed nodes for each product. In the case of BST, the entire subtree rooted at the node for the earlier date can be removed and the total count at the root be updated by subtracting the value in that subtree.

Since the window is 30 days in length, all these operations are constant time O(n=30) or O(log(n=30)), irrespective of the data structure used.

D. To keep track of the top 10 products in the window, a min-heap of size 10 should be maintained, where each node corresponds to a unique product and whose value is the total number of entries for that product in the 30 day window. The heap is organized by this value. This heap should be updated every time the data structure in A (and optionally B) is updated.

Also, for each node that appears in D, there should be a reference to it from the corresponding entry in B (or null, if a product in B is not a top-10 product and hence does not appear in D), so that the node can in D can be efficiently located, removed and reinserted with updated value if necessary, for each product whose count is updated.

E. Also, a product may not appear in some log files within a 30 day period at all. If there are a large number of products, it will be inefficient to scan the data structure in A for each product to determine whether nodes exist corresponding to earlier dates.

This can be mitigated by using a cache where the key is the date and the entry is the list of data structures in B for products that appear in logs files for that date. This way, for example, when the log file for 1st May comes along, the data structures B for products that appeared on 1st April can be efficiently retrieved and updated.

This way, the min-heap in D will always contain the top 10 products that appeared in log files within the last 30 days.

- sujoyg May 24, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

//
//
//  THERE ARE SEVERAL LOG FILES COMING BY DATE WITH PRODUCT IDS 
//  AND I NEED TO REPORT THE TOP 10 (PRODUCT IDS) DURING A MOVING 
//  PERIOD OF 1 MONTH. DISCUSSED ABOUT THE DATA STRUCTURES NEEDED 
//  TO IMPLEMENT THE SOLUTION.
//
//     -- bhardwaj.cs February 23, 2016 in United States
//        Bloomberg LP Interview Question 
//
// Run with VM arguments -ea to enable assert testing
//
// (c) 2016 Perry Anderson, All Rights Reserved, worldwide.
//
//

import java.util.*;

/*
 * From what I can glean from the problem description the question
 * being asked is how I would get information from an inventory 
 * database and sort the results on two attributes, when the 
 * products were stocked and how much of the product was stocked.
 *  
 */

class Product implements Comparable<Product> {
	String id;
	int count;
	Date stocked;
	Product(String id, int count, Date stocked) {
		this.id = id;
		this.count = count;
		this.stocked = stocked;
	}
	@Override
	public int compareTo(Product o) {
		if ( o.stocked == stocked )
			return o.count - count;
		else
			return (int)(o.stocked.getTime() - stocked.getTime());
	}
}

public class LPMain005 {
	
	static Product[] getReport(Product[] products) {
		PriorityQueue<Product> queue = 
	            new PriorityQueue<Product>(10);
		for ( Product product : products )
			queue.add(product);	
		Product[] report = new Product[products.length];
		for ( int i=0; i<products.length; i++ )
			report[i] = queue.poll();
		return report;
	}

	
	public static void main(String[] args) {

		Calendar calender = Calendar.getInstance();
		calender.add(Calendar.MONTH, -1);
		Date lastMonth = calender.getTime();
		Date thisMonth = new Date();
			
		Product[] test1 = { 
				new Product("Bananas",5,lastMonth),
				new Product("Blackberries",15,thisMonth),
				new Product("Yellowberries",7,thisMonth),
				new Product("Apples",10,lastMonth),
				new Product("Limes",3,lastMonth),
				new Product("Blueberries",20,thisMonth),
				new Product("Redberries",12,thisMonth)
		};
		
		Product[] results1 = { 
				new Product("Apples",10,lastMonth),
				new Product("Bananas",5,lastMonth),
				new Product("Limes",3,lastMonth),
				new Product("Blueberries",20,thisMonth),
				new Product("Blackberries",15,thisMonth),
				new Product("Redberries",12,thisMonth),
				new Product("Yellowberries",7,thisMonth)
		};

		/*
		 * Due to a glitch in the JVM we have to compare the arrays
		 * using a manual approach;
		 */
		
//		assert Arrays.equals(getReport(test1), results1);
//		assert !Arrays.equals(getReport(test1), test1);

		Product[] report = getReport(test1);
		for ( int i=0; i<test1.length; i++ )
			if ( !report[i].id.equals(results1[i].id) )
				System.out.println("Different");

	}
	
}

- perry.anderson November 25, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

//
//
//  THERE ARE SEVERAL LOG FILES COMING BY DATE WITH PRODUCT IDS 
//  AND I NEED TO REPORT THE TOP 10 (PRODUCT IDS) DURING A MOVING 
//  PERIOD OF 1 MONTH. DISCUSSED ABOUT THE DATA STRUCTURES NEEDED 
//  TO IMPLEMENT THE SOLUTION.
//
//     -- bhardwaj.cs February 23, 2016 in United States
//        Bloomberg LP Interview Question 
//
// Run with VM arguments -ea to enable assert testing
//
// (c) 2016 Perry Anderson, All Rights Reserved, worldwide.
//
//

import java.util.*;

/*
 * From what I can glean from the problem description the question
 * being asked is how I would get information from an inventory 
 * database and sort the results on two attributes, when the 
 * products were stocked and how much of the product was stocked.
 *  
 */

class Product implements Comparable<Product> {
	String id;
	int count;
	Date stocked;
	Product(String id, int count, Date stocked) {
		this.id = id;
		this.count = count;
		this.stocked = stocked;
	}
	
	/*
	 * The logic required to meet the specification of this
	 * task can be resolved using the compareTo() method to
	 * simply sort the Product objects based on the date
	 * (represented by 'stocked' here) and by the actual
	 * count of the items of the Product in stock.
	 * 
	 * (non-Javadoc)
	 * @see java.lang.Comparable#compareTo(java.lang.Object)
	 */
	
	@Override
	public int compareTo(Product o) {
		if ( o.stocked == stocked )
			return o.count - count;
		else
			return (int)(o.stocked.getTime() - stocked.getTime());
	}
}

public class LPMain005 {
	
	/*
	 * This method merely makes use of the PriorityQueue<>
	 * container which carries out a sort internally which
	 * depends on the implementation of the compareTo() 
	 * method of the Product class.
	 * 
	 */
	static Product[] getReport(Product[] products) {
		PriorityQueue<Product> queue = 
	            new PriorityQueue<Product>(10);
		for ( Product product : products )
			queue.add(product);	
		Product[] report = new Product[products.length];
		for ( int i=0; i<products.length; i++ )
			report[i] = queue.poll();
		return report;
	}

	
	public static void main(String[] args) {

		Calendar calender = Calendar.getInstance();
		calender.add(Calendar.MONTH, -1);
		Date lastMonth = calender.getTime();
		Date thisMonth = new Date();
			
		Product[] test1 = { 
				new Product("Bananas",5,lastMonth),
				new Product("Blackberries",15,thisMonth),
				new Product("Yellowberries",7,thisMonth),
				new Product("Apples",10,lastMonth),
				new Product("Limes",3,lastMonth),
				new Product("Blueberries",20,thisMonth),
				new Product("Redberries",12,thisMonth)
		};
		
		Product[] results1 = { 
				new Product("Apples",10,lastMonth),
				new Product("Bananas",5,lastMonth),
				new Product("Limes",3,lastMonth),
				new Product("Blueberries",20,thisMonth),
				new Product("Blackberries",15,thisMonth),
				new Product("Redberries",12,thisMonth),
				new Product("Yellowberries",7,thisMonth)
		};

		/*
		 * Due to a glitch in the JVM we have to compare the arrays
		 * using a manual approach;
		 */
		
//		assert Arrays.equals(getReport(test1), results1);
//		assert !Arrays.equals(getReport(test1), test1);

		Product[] report = getReport(test1);
		for ( int i=0; i<test1.length; i++ )
			if ( !report[i].id.equals(results1[i].id) )
				System.out.println("Different");

	}
	
}

- perry.anderson November 30, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Data structure: hash table.

Traverse every entry in the log file. For each entry use the entry's ID as the hash table's key. And if the date corresponding to that ID is within the specified range, increment the value of that key by 1 (initially initialized to 0). This operation is O(n).

Afterward, use a generic O(nlogn) algorithm to sort the hash table by values (or use a max-heap tree, where each node contains the hashtable's key-value pair). Report the key of the top 10 result.

So this the time complexity should be O(n)+O(nlogn) = O(n)

- Jimmy February 25, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

You need a Most/Least Recently Used Cache(L/MRU Cache). By implementing this, you'll need a Hash Table to record id, and a priority queue to maintain the retrieve times as highest priority. Meanwhile, your data structure must have two methods, detach() and insert() at any position in the structure.

The class could look something like this.

class Node{
	int id;
        int retrieve_times;
	Node prev, next;
}

And, detach() and insert()

void detach(Node n){
	if n is head of queue
		head=n.next;
	else n.prev.next=n.next;

	if n is end of queue
		end=n.prev
	else n.next=prev=n.prev
}

void insert(Node n){
	n.next=head;
	n.prev=null;
	if head is not null 
		head.prev=n
	head=n;
	if end is null
		end=n
}

If incoming id hasn't appeared in the queue, do the following
Add (id, node) to Hash Table;
Increment retrieve time (r_t);
Insert node(id, r_t) to queue;
else
Increment r_t;
Detach node from queue;
Re-insert node into queue;

Since, your data structure uses priority queue, it automatically sort r_t in descending order. So once you insert node(id, r_t), the queue will put it at the relevantly proper position. Now, extract the first 10 elements in queue, you will have your top 10!

- Anonymous February 29, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

You'll need a complex data structure, more likely a Most Used Cache.
To do this, you need to define a new class to simulate the cache, a hash table to record id, and a max heap to function as your actual cache.

Class looks like this:

class Node{
	int id;	//id
	int r_t;	//Retrieve times
	Node prev, next; //previous and next node
}

The cache basically performs the following:

If incoming id hasn't appeared in heap (Use Hash Table to retrieve)
	Add it to Hash Table;
	Create new Node(id, r_t=1)
	Insert node into heap;
else
	Delete node from heap;
	Increment r_t by retrieving Hash Table using key=id;
	Re-insert node into heap;

Note: the insert(Node n) and delete(Node n) operations have to be considered properly performed at any position in heap.

Since max heap keeps r_t in descending order, once you update r_t and re-insert the node, the heap will automatically put it into right position. To accomplish this, you can choose either to design your own heap, or to use PriorityQueue in built-in library. After importing all incoming data stream into your heap, in which the first 10 elements is your top 10 data.

- hcallen.wang February 29, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Data Structure: HashMap and a PriorityQueue.

Suppose Product ID is a String, then we need:

class Product{
	String ID;
	int cnt;
	Product(String id, int c) {
		ID = id;
		cnt = c;
	}
}

Map<String, Integer> map = new HashMap<>();
Queue<Product> minTen = new PriorityQueue<>((x, y) -> x.cnt - y.cnt);

void add(String product) {
	int cnt = 0;
	if(map.containsKey(product) {
		map.put(product, cnt = map.get(product) + 1);
	} else 
		map.put(product, cnt = 1);
	if(minTen.size() == 10)
		minTen.poll();
	minTen.add(new Product(product, cnt));
}

List<String> top10() {
	List<String> result = new ArrayList<>();
	for(String id : minTen) {
		result.add(id);
	}
	return result;
}

- Chao March 02, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Data Structure: HashMap and a PriorityQueue.

Suppose Product ID is a String, then we need:

class Product{
	String ID;
	int cnt;
	Product(String id, int c) {
		ID = id;
		cnt = c;
	}
}

Map<String, Integer> map = new HashMap<>();
Queue<Product> minTen = new PriorityQueue<>((x, y) -> x.cnt - y.cnt);

void add(String product) {
	int cnt = 0;
	if(map.containsKey(product) {
		map.put(product, cnt = map.get(product) + 1);
	} else 
		map.put(product, cnt = 1);
	if(minTen.size() == 10)
		minTen.poll();
	minTen.add(new Product(product, cnt));
}

List<String> top10() {
	List<String> result = new ArrayList<>();
	for(String id : minTen) {
		result.add(id);
	}
	return result;
}

- zzz10210952 March 02, 2016 | Flag Reply

CareerCup

Bloomberg LP Interview Question for Senior Software Development Engineers

Books

Videos

Resume Review

Mock Interviews