Entry in the log file is like

Amazon Interview Question for SDE1s

0

of 0 votes

22
Answers
Entry in the log file is like this:
User 1 visited Page 4
User 3 visited Page 2
User 7 visited Page 9
.
.
.
Design an efficient data structure which supports queries like the following:
Which page was visited by exactly 2 users in day?
Which page was visited by only one user exactly 2 times in a day?
Which page was visited by ‘User 3? more than 5 times in a day?
- openhealth2014 February 01, 2015 in United States | Report Duplicate | Flag | PURGE
Amazon SDE1 Object Oriented Design

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 4 vote

Looks to me like weighted ghraph is most suitable. Vertices are users and pages.
Weights are numbers of visits per day for specific user and page.
It can be represented like this 2D matrix:

DAY | Page1 | Page2 | Page3 | ...
User1 | 1 | 0 | 2 | ...
User2 | 0 | 6 | 2 | ...
User3 | 3 | 2 | 4 | ...
...

As for fow to store this efficiently, I'm not sure.
It could be array in combination with linked lists, or HashTable in combination with linked lists...

- Anonymous February 03, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

although this one will answer the 3rd question pretty fast, how do you think it will answer the first and second question? You'll have to scan through the whole matrix to do that

- Sand Storm March 10, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

There is a lot of ambiguity that needs further clarification. A few questions are below:
1. Is the system need to handle logs for multiple days?
2. If answer to #1 is yes, then, is it possible to have questions for multiple days?
3. If answer to #1 is no, then, can a user visit multiple pages in a day or same page multiple times by the same user?
4. Do we need to keep visited timestamps?

- Victor February 02, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

What about

HashMap<Integer, HashMap<Integer, Integer>

?
Which means

HashMap<Pages, HashMap<Users, VisitedTimes>>

I don't know whether it's a valid solution.
If I have made some mistakes, please let me know.

- zhengzhong February 02, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Since the queries focused on user then I think it is better to let the outer HashTable indexed by UserId

HashTable its key is UserId and it is value is another HashTable (pageId, Date)

- Osama Ibrahim February 02, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Only third query is focused on individual user. Other two are not concerned by individual users, only total count of them visiting some pages.

- Anonymous February 03, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

but to answer the questions using this structure you'll have to scan through the hashmap cause there's no way from it that you know that page 1 for example were visited 2 times

- sand storm March 10, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

These are the pairs we are supposed to maintain as HashMaps, so that we can retrieve the values asap.
We can create a Class 'UserInfo' with these values and have 4 different HashMaps with Keys as the elements and value as
User Id - Page No - No Of Visits - Visited Date

We need to have the below elements in the processing logic and need to update them accordingly (these are handy).
No of Users per Page - No of Pages Per User Id per Day

- Srinivas February 02, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Any idea if we can use treaps??

- Azad February 02, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

By using Graph Adjacency List implementation

User1
----
id
name
List<Page> visited pages

User2
----
id
name
List<Page> visited pages

User3
----
id
name
List<Page> visited pages

------------------------
Page

int pageno;
int weight;
--------------------------
increase the weight for each time visits.

- Freemans February 22, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

By using Graph Adjacency List implementation

User1
---- 
id
name
List<Page> visited pages

User2
---- 
id
name
List<Page> visited pages


User3
---- 
id
name
List<Page> visited pages

------------------------
Page

int pageno;
int weight;
--------------------------
increase the weight for each time visits.

- Freemans February 22, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

requirements:
need to track # of visits of each page
need to track who visited the page & how many times
need to track each user activity of pages he visited and how many times

I think we need 2 data structures to solve this problem, one is geared toward pages and the other toward users

for pages, we could use max heap binary tree, the heap property is a pair of the total number of visits for this page along with the number of unique users visited this page.
each node will have list of users that visited this page and of course the page information itself.

this should answer first and second questions in log(n)

question 1:
which page was visited by exactly 2 users in day?
we start searching through the heap till we reach a pair that has count 2 of unique users.

question 2:
which page was visited by only one user exactly 2 times in a day?
we walk down the heap again looking for a pair that has only 1 as the unique number of users visited this page and 2 as the total times this page was visited.

question 3 is geared toward users, it would be not efficient if we used the same data structure, so we should (could) use hash table for users that would be something like:

hash table <User, List of Pair<Page, # of Visits of this Page>>
so with this data structure, question 3 we retrieve the list of pairs for user 3 and we look the pairs with # of Visits of this Page = 5 and get the page

- sandstormrobot March 11, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Here is the solution:

we will have 3 maps,

1. map<int, set<int> > uniqueVisitsToPageMap

which will rank pages according to unique visits for each page, assume page 1 visited by
2 unique users then a new unique user visited it, we will remove the page 1 from
2 visits key and add it to 3 visits key.

2. map<int, set<int> > pageToUsersTracMap;

this map will tell each page and which user visited it, as you can notice this is a set so no duplicates in users ID

3.map<int, map<int, int> > userToPageMap;

this map between user ID and the pages he visited and how much time he visit each page

UsersID -> { pageID -> visit times }

Now, I will assume that map is hash table ( unordered_map ) and set is hash set ( unordered_set ),

* for the first requirement it will cost O(1) to *find the pages*

* for the second requirement it will cost O(1) to find the pages and N to *find the page with 2 visit times *
where N here is the number of pages with 1 unique visit. yes we can do better if we user Bi Map

* for third requirement, it will cost O(1) to find the user, and N to find the page where N is the #
of pages visited by this user.

Here is the code, where I use map and set here just to minimize the code text size for me, but you replace it with unordered_map and unordered_set

class LogFileQuery {
public:

	void addLog(int userID, int pageID) {

		int visitsRank = pageToUsersTracMap[pageID].size();

		if (pageToUsersTracMap[pageID].count(userID) <= 0 && visitsRank>0) {
			uniqueVisitsToPageMap[visitsRank].erase(pageID);
			
			visitsRank = visitsRank + 1;


			uniqueVisitsToPageMap[visitsRank].insert(pageID); 
                }

		pageToUsersTracMap[pageID].insert(userID); 

		userToPageMap[userID][pageID]++; // user ID -> pageID | visits
	}

	vector<int> pagesByVisits(int numberOfUniqueUsers) {
		vector<int> pages;

 
		if (uniqueVisitsToPageMap.count(numberOfUniqueUsers) >= 0) 		{
			auto retPages = uniqueVisitsToPageMap[numberOfUniqueUsers];

			for (auto& p : retPages) {
				pages.push_back(p);
			}
		}
			
		return pages;
	}

	vector<int> pagesVisitedByOnlyOneUser(int times) {
		vector<int> pages;

		set<int> pages1TimeUnqiueVisit;

		pages1TimeUnqiueVisit = uniqueVisitsToPageMap[1]; 
		for (auto& pageID : pages1TimeUnqiueVisit) { 
			set<int> users = pageToUsersTracMap[pageID];

			for (auto& userID : users) {
				if (userToPageMap[userID][pageID] == times)
					pages.push_back(pageID);
			}
		}

		return pages;

	}

	vector<int> pagesVisitedByUser(int userID, int moreThanTimes) {
		vector<int> pages;

		if (userToPageMap.count(userID) >= 0) {
			for (auto& p : userToPageMap[userID]) {
				if (p.second > moreThanTimes)
					pages.push_back(p.first);
			}
		}

		return pages;
	}

protected:

	map<int, map<int, int> > userToPageMap; // usersID -> pageID | visits
	map<int, set<int> >      pageToUsersTracMap; // pageID -> visitied user ID
	map<int, set<int> >      uniqueVisitsToPageMap; // unqie visites# -> page ID 
};

- LaithBasilDotNet May 13, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Map<Integer,Map<String,LinkedList<String>>>
For Map<#ofVisits,Map<PageName,List<Usernames>>>

and then another Map<String,Integer> for Map<PageName,#ofVisits>

When a page is visited, you find it from the second map, based on the #of visits you find it from the first map, add the newly visited user to the list and move that page in the first map e.g. if it was mapped against 2, you move it to 3.

At any given moment, it is easy to answer the 3 questions from the first map.

- Asif Garhi February 21, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think datastructure is
Class PageVisitedHistory
{
int PageId;
List<UserInfo> userInfoList;
}

public class UserInfo
{
private int userId;
private int counter;
}

My assumption is that datastructure get reloaded every day.

List<PageVisitedHistory> pageHistoryCurrentDate = GetPageHistory(DateTime.now);

Answers are:
1) pageHistoryCurrentDate.Find(p => p.userInfoList.Count == 2).PageId;
2) pageHistoryCurrentDate.Find(p => p.userInfoList.Count == 1 && p.userinfoList[0].counter ==2).PageId;
3) pageHistoryCurrentDate.Find(p => p.userInfoList.Find(u => u.userId.Equals("userid3" && p.userinfoList[0].counter >= 5)).PageId;

Please let me know if my approach is correct

- Student November 02, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

hash table index by page number. Each entry points to a another hashtable indexed by user number. Each user entry contains the number of times that user visited that page.

unordered_map<int page_number, unordered_map<int user_number, int num_of_visits>>

1. Iterate through the outer hash_map. For each entry, get the size of the user hash_map . If it is 2, output it. Time complexity is O(n).

2. Iterate through the outer hash_map. For each entry, check if the size of the user hash_map is 1, and if it is, check if the number of visits by that one user is 2. If so, then output. Time complexity is O(N).

3. Iterate through the outer hash_map of pages. For each entry, look for user 3 in that user hash map. If found, check if the times visited is 5. If so, then output it.Time Complexity is O(N).

that leaves one big problem.. the above does take into accoun the 'in a day' restriction. To fix that, I would probably have a background job that clears out entries that are older than 24 hours. Run this job every day or as often as you can afford.

- qakbot100 February 02, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Your solution is same as the previous solution posted. @gilles_monnier@yahoo.fr rightly pointed out what if same page is visited by multiple users. Keying the map by pageId would overwrite the map of <userId, NumberOfVisit>.

The solution could be Map<PageId, List<Map<userId, visit>>>. But as suggested below I don't think it's the most optimal solution.

- cyb February 02, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

if same page is visited by multiple users, you just increment the visit count for each of the users for that page. I don't think yuou need a list of maps like you've pointed out.

- qakbot100 February 02, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

cyborg is talking about the overwrites... you cant handle it with ur solution. Each time a page is visited by a user the value for the pageid key is overwritten.

- Anonymous February 02, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I think qakbot100's solution does not have the problem as you pointed out. When the page is visited for the very first time by user A, we create a map entry with page id and a new HashMap that contains <A, 1>. If later, the same page is visited by a second user B, instead of overwriting the HashMap<userid, visits>, we insert a new record <B, 1> into the map. And if user A comes to visit the second time, we locate the map entry inside HashMap<userid, visits> and increment visits by 1. Please let me know if I get this wrong.

- dchen215 February 02, 2015 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Not sure if that would lead to the most efficient one..
The granularity of the time seems to be 1 day, but let's assume "any time range".
So for each given "time range" we should be able to build the graph of users and pages, where each link is non-directional with the assigned cost of # of times the user visited that page:

/---(visited 3 times) -- pageT
userX -- (visited 2 times)  -- pageY
           \--(visited 5 times)  -- pageZ

Any graph gurus?

- Vlad February 02, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

What if several users access the same page? Your array will be too small.
So you'll probably have to use a List of pairs in your array. But I'm not sure this is a good solution...

- gilles_monnier@yahoo.fr February 01, 2015 | Flag

CareerCup

Amazon Interview Question for SDE1s

Books

Videos

Resume Review

Mock Interviews