Amazon Interview Question
SDE1sCountry: United States
Interview Type: In-Person
There is a lot of ambiguity that needs further clarification. A few questions are below:
1. Is the system need to handle logs for multiple days?
2. If answer to #1 is yes, then, is it possible to have questions for multiple days?
3. If answer to #1 is no, then, can a user visit multiple pages in a day or same page multiple times by the same user?
4. Do we need to keep visited timestamps?
What about
HashMap<Integer, HashMap<Integer, Integer>
?
Which means
HashMap<Pages, HashMap<Users, VisitedTimes>>
I don't know whether it's a valid solution.
If I have made some mistakes, please let me know.
Since the queries focused on user then I think it is better to let the outer HashTable indexed by UserId
HashTable its key is UserId and it is value is another HashTable (pageId, Date)
Only third query is focused on individual user. Other two are not concerned by individual users, only total count of them visiting some pages.
These are the pairs we are supposed to maintain as HashMaps, so that we can retrieve the values asap.
We can create a Class 'UserInfo' with these values and have 4 different HashMaps with Keys as the elements and value as
User Id - Page No - No Of Visits - Visited Date
We need to have the below elements in the processing logic and need to update them accordingly (these are handy).
No of Users per Page - No of Pages Per User Id per Day
By using Graph Adjacency List implementation
User1
----
id
name
List<Page> visited pages
User2
----
id
name
List<Page> visited pages
User3
----
id
name
List<Page> visited pages
------------------------
Page
int pageno;
int weight;
--------------------------
increase the weight for each time visits.
requirements:
need to track # of visits of each page
need to track who visited the page & how many times
need to track each user activity of pages he visited and how many times
I think we need 2 data structures to solve this problem, one is geared toward pages and the other toward users
for pages, we could use max heap binary tree, the heap property is a pair of the total number of visits for this page along with the number of unique users visited this page.
each node will have list of users that visited this page and of course the page information itself.
this should answer first and second questions in log(n)
question 1:
which page was visited by exactly 2 users in day?
we start searching through the heap till we reach a pair that has count 2 of unique users.
question 2:
which page was visited by only one user exactly 2 times in a day?
we walk down the heap again looking for a pair that has only 1 as the unique number of users visited this page and 2 as the total times this page was visited.
question 3 is geared toward users, it would be not efficient if we used the same data structure, so we should (could) use hash table for users that would be something like:
hash table <User, List of Pair<Page, # of Visits of this Page>>
so with this data structure, question 3 we retrieve the list of pairs for user 3 and we look the pairs with # of Visits of this Page = 5 and get the page
Here is the solution:
we will have 3 maps,
1. map<int, set<int> > uniqueVisitsToPageMap
which will rank pages according to unique visits for each page, assume page 1 visited by
2 unique users then a new unique user visited it, we will remove the page 1 from
2 visits key and add it to 3 visits key.
2. map<int, set<int> > pageToUsersTracMap;
this map will tell each page and which user visited it, as you can notice this is a set so no duplicates in users ID
3.map<int, map<int, int> > userToPageMap;
this map between user ID and the pages he visited and how much time he visit each page
UsersID -> { pageID -> visit times }
Now, I will assume that map is hash table ( unordered_map ) and set is hash set ( unordered_set ),
* for the first requirement it will cost O(1) to *find the pages*
* for the second requirement it will cost O(1) to find the pages and N to *find the page with 2 visit times *
where N here is the number of pages with 1 unique visit. yes we can do better if we user Bi Map
* for third requirement, it will cost O(1) to find the user, and N to find the page where N is the #
of pages visited by this user.
Here is the code, where I use map and set here just to minimize the code text size for me, but you replace it with unordered_map and unordered_set
class LogFileQuery {
public:
void addLog(int userID, int pageID) {
int visitsRank = pageToUsersTracMap[pageID].size();
if (pageToUsersTracMap[pageID].count(userID) <= 0 && visitsRank>0) {
uniqueVisitsToPageMap[visitsRank].erase(pageID);
visitsRank = visitsRank + 1;
uniqueVisitsToPageMap[visitsRank].insert(pageID);
}
pageToUsersTracMap[pageID].insert(userID);
userToPageMap[userID][pageID]++; // user ID -> pageID | visits
}
vector<int> pagesByVisits(int numberOfUniqueUsers) {
vector<int> pages;
if (uniqueVisitsToPageMap.count(numberOfUniqueUsers) >= 0) {
auto retPages = uniqueVisitsToPageMap[numberOfUniqueUsers];
for (auto& p : retPages) {
pages.push_back(p);
}
}
return pages;
}
vector<int> pagesVisitedByOnlyOneUser(int times) {
vector<int> pages;
set<int> pages1TimeUnqiueVisit;
pages1TimeUnqiueVisit = uniqueVisitsToPageMap[1];
for (auto& pageID : pages1TimeUnqiueVisit) {
set<int> users = pageToUsersTracMap[pageID];
for (auto& userID : users) {
if (userToPageMap[userID][pageID] == times)
pages.push_back(pageID);
}
}
return pages;
}
vector<int> pagesVisitedByUser(int userID, int moreThanTimes) {
vector<int> pages;
if (userToPageMap.count(userID) >= 0) {
for (auto& p : userToPageMap[userID]) {
if (p.second > moreThanTimes)
pages.push_back(p.first);
}
}
return pages;
}
protected:
map<int, map<int, int> > userToPageMap; // usersID -> pageID | visits
map<int, set<int> > pageToUsersTracMap; // pageID -> visitied user ID
map<int, set<int> > uniqueVisitsToPageMap; // unqie visites# -> page ID
};
Map<Integer,Map<String,LinkedList<String>>>
For Map<#ofVisits,Map<PageName,List<Usernames>>>
and then another Map<String,Integer> for Map<PageName,#ofVisits>
When a page is visited, you find it from the second map, based on the #of visits you find it from the first map, add the newly visited user to the list and move that page in the first map e.g. if it was mapped against 2, you move it to 3.
At any given moment, it is easy to answer the 3 questions from the first map.
I think datastructure is
Class PageVisitedHistory
{
int PageId;
List<UserInfo> userInfoList;
}
public class UserInfo
{
private int userId;
private int counter;
}
My assumption is that datastructure get reloaded every day.
List<PageVisitedHistory> pageHistoryCurrentDate = GetPageHistory(DateTime.now);
Answers are:
1) pageHistoryCurrentDate.Find(p => p.userInfoList.Count == 2).PageId;
2) pageHistoryCurrentDate.Find(p => p.userInfoList.Count == 1 && p.userinfoList[0].counter ==2).PageId;
3) pageHistoryCurrentDate.Find(p => p.userInfoList.Find(u => u.userId.Equals("userid3" && p.userinfoList[0].counter >= 5)).PageId;
Please let me know if my approach is correct
hash table index by page number. Each entry points to a another hashtable indexed by user number. Each user entry contains the number of times that user visited that page.
unordered_map<int page_number, unordered_map<int user_number, int num_of_visits>>
1. Iterate through the outer hash_map. For each entry, get the size of the user hash_map . If it is 2, output it. Time complexity is O(n).
2. Iterate through the outer hash_map. For each entry, check if the size of the user hash_map is 1, and if it is, check if the number of visits by that one user is 2. If so, then output. Time complexity is O(N).
3. Iterate through the outer hash_map of pages. For each entry, look for user 3 in that user hash map. If found, check if the times visited is 5. If so, then output it.Time Complexity is O(N).
that leaves one big problem.. the above does take into accoun the 'in a day' restriction. To fix that, I would probably have a background job that clears out entries that are older than 24 hours. Run this job every day or as often as you can afford.
Your solution is same as the previous solution posted. @gilles_monnier@yahoo.fr rightly pointed out what if same page is visited by multiple users. Keying the map by pageId would overwrite the map of <userId, NumberOfVisit>.
The solution could be Map<PageId, List<Map<userId, visit>>>. But as suggested below I don't think it's the most optimal solution.
if same page is visited by multiple users, you just increment the visit count for each of the users for that page. I don't think yuou need a list of maps like you've pointed out.
cyborg is talking about the overwrites... you cant handle it with ur solution. Each time a page is visited by a user the value for the pageid key is overwritten.
I think qakbot100's solution does not have the problem as you pointed out. When the page is visited for the very first time by user A, we create a map entry with page id and a new HashMap that contains <A, 1>. If later, the same page is visited by a second user B, instead of overwriting the HashMap<userid, visits>, we insert a new record <B, 1> into the map. And if user A comes to visit the second time, we locate the map entry inside HashMap<userid, visits> and increment visits by 1. Please let me know if I get this wrong.
Not sure if that would lead to the most efficient one..
The granularity of the time seems to be 1 day, but let's assume "any time range".
So for each given "time range" we should be able to build the graph of users and pages, where each link is non-directional with the assigned cost of # of times the user visited that page:
/---(visited 3 times) -- pageT
userX -- (visited 2 times) -- pageY
\--(visited 5 times) -- pageZ
Any graph gurus?
Looks to me like weighted ghraph is most suitable. Vertices are users and pages.
- Anonymous February 03, 2015Weights are numbers of visits per day for specific user and page.
It can be represented like this 2D matrix:
DAY | Page1 | Page2 | Page3 | ...
User1 | 1 | 0 | 2 | ...
User2 | 0 | 6 | 2 | ...
User3 | 3 | 2 | 4 | ...
...
As for fow to store this efficiently, I'm not sure.
It could be array in combination with linked lists, or HashTable in combination with linked lists...