Expedia Interview Question
Country: United States
How would hashing benefit in any way? We are looking for a max count here.
A Hashtable could possibly be useful if you intend to query the number of tweets for a given movie as input, which is not the case here.
Unless you already have movie hashtag as a key to number of tweets for all the possible hashtags, you will need to hash all those key value pairs
Oh you meant we should create a hashtag when you said Hashing. I would just ask the interviewer how a tweet is identified to be that of so n so movie.
He may suggest hashtag or an id associated with the tweet or the movie name. All work for us. We construct our Tweet object accordingly.
could you please describe it in detail? I mentioned hashtable, but he was not happy with the answer.
He couldn't have been. HashMap would take up unnecessary space and not give a resolve to the question at hand.
We use a HashMap when we want a key value pair and the requirement is something like, how will you find out how many tweets a particular movie got.
Here what is asked is not the number of tweets a movie got, but the movie which got the max number of tweets, a Heap generally solves questions like find the maximum or minimum number in a lot of data (maybe a stream) or questions like find the 10 or 20 or 'k' movies with the maximum tweets.
Hope this helps!
Seems like your heap is updated based on the movie occurrence count ( Movie [] moviesHeap ).
This does not solve the problem of "last 24 hours".
One way to solve this problem is to use addtional heap that stores tweets of last 24hours (Tweet [] tweetHeap).
when scanning for new tweet, insert method updates 2 heaps:
movieHeap
tweetHeap
while querying for movie which is tweeted max times in last 24hours:
remove all items from tweetHeap which are older than 24hours (and corresponding updates to moviesHeap) and return the root of moviesHeap.
We have to analyse the tweet dataset we got in past 24 hours :
Steps :
1. Search hashtags and then break it and then match the word with movie database which can be downloaded online.
For in-depth analysis follow following steps :
2. Once done remove hashtag and normalize the tweet dataset by removing the @ and lowercasing all the word. And then do lemmatization.
3. Select every tweet one by one and then break it into token and then search each word in correspondence with given database.
If you use python it will be lot easier bcoz of present libraries.
Tweet[] tweets = new Tweet[5];
tweets[0] = new Tweet(9, new DateTime(2017, 02, 21, 10, 01, 20), "C");
tweets[1] = new Tweet(2, new DateTime(2017, 02, 20, 22, 01, 20), "B");
tweets[2] = new Tweet(120, new DateTime(2017, 02, 20, 18, 00, 00), "D");
tweets[3] = new Tweet(1, new DateTime(2017, 02, 20, 05, 01, 20), "A");
tweets[4] = new Tweet(20, new DateTime(2017, 02, 21, 14, 01, 20), "E");
var sorted = tweets.OrderByDescending(x => x.TimeStamp.Date).ThenByDescending(y => y.TimeStamp.TimeOfDay);
DateTime now = DateTime.Now;
int max = tweets[0].Count;
Tweet t1 = null;
foreach (Tweet t in sorted)
{
if (t.TimeStamp > now.AddHours(-24) && t.TimeStamp <= now && max < t.Count)
{
max = t.Count;
t1 = t;
}
}
Console.WriteLine(t1.TimeStamp + " : " + t1.Name + "=" + t1.Count);
[TestMethod]
public void TweetsInLast24Hrs()
{
Tweet[] tweets = new Tweet[5];
tweets[0] = new Tweet(9, new DateTime(2017, 02, 21, 10, 01, 20), "C");
tweets[1] = new Tweet(2, new DateTime(2017, 02, 20, 22, 01, 20), "B");
tweets[2] = new Tweet(120, new DateTime(2017, 02, 20, 18, 00, 00), "D");
tweets[3] = new Tweet(1, new DateTime(2017, 02, 20, 05, 01, 20), "A");
tweets[4] = new Tweet(20, new DateTime(2017, 02, 21, 14, 01, 20), "E");
var sorted = tweets.OrderByDescending(x => x.TimeStamp.Date).ThenByDescending(y => y.TimeStamp.TimeOfDay);
DateTime now = DateTime.Now;
int max = tweets[0].Count;
Tweet t1 = null;
foreach (Tweet t in sorted)
{
if (t.TimeStamp > now.AddHours(-24) && t.TimeStamp <= now && max < t.Count)
{
max = t.Count;
t1 = t;
}
}
Console.WriteLine(t1.TimeStamp + " : " + t1.Name + "=" + t1.Count);
}
[TestMethod]
public void TweetsInLast24Hrs()
{
Tweet[] tweets = new Tweet[5];
tweets[0] = new Tweet(9, new DateTime(2017, 02, 21, 10, 01, 20), "C");
tweets[1] = new Tweet(2, new DateTime(2017, 02, 20, 22, 01, 20), "B");
tweets[2] = new Tweet(120, new DateTime(2017, 02, 20, 18, 00, 00), "D");
tweets[3] = new Tweet(1, new DateTime(2017, 02, 20, 05, 01, 20), "A");
tweets[4] = new Tweet(20, new DateTime(2017, 02, 21, 14, 01, 20), "E");
var sorted = tweets.OrderByDescending(x => x.TimeStamp.Date).ThenByDescending(y => y.TimeStamp.TimeOfDay);
DateTime now = DateTime.Now;
int max = tweets[0].Count;
Tweet t1 = null;
foreach (Tweet t in sorted)
{
if (t.TimeStamp > now.AddHours(-24) && t.TimeStamp <= now && max < t.Count)
{
max = t.Count;
t1 = t;
}
}
Console.WriteLine(t1.TimeStamp + " : " + t1.Name + "=" + t1.Count);
}
[TestMethod]
public void TweetsInLast24Hrs()
{
Tweet[] tweets = new Tweet[5];
tweets[0] = new Tweet(9, new DateTime(2017, 02, 21, 10, 01, 20), "C");
tweets[1] = new Tweet(2, new DateTime(2017, 02, 20, 22, 01, 20), "B");
tweets[2] = new Tweet(120, new DateTime(2017, 02, 20, 18, 00, 00), "D");
tweets[3] = new Tweet(1, new DateTime(2017, 02, 20, 05, 01, 20), "A");
tweets[4] = new Tweet(20, new DateTime(2017, 02, 21, 14, 01, 20), "E");
var sorted = tweets.OrderByDescending(x => x.TimeStamp.Date).ThenByDescending(y => y.TimeStamp.TimeOfDay);
DateTime now = DateTime.Now;
int max = tweets[0].Count;
Tweet t1 = null;
foreach (Tweet t in sorted)
{
if (t.TimeStamp > now.AddHours(-24) && t.TimeStamp <= now && max < t.Count)
{
max = t.Count;
t1 = t;
}
}
Console.WriteLine(t1.TimeStamp + " : " + t1.Name + "=" + t1.Count);
}
Use a Max Heap of something called 'Tweet' objects.
- teli.vaibhav September 04, 2015Each of these Tweet objects will have attributes like 'Movie Hashtag' which would be a unique identifier, another attribute called 'Number of Tweets' which would be used to decide the position of an object in the Max Heap.
At any given time the root element would be the most tweeted movie.
Solution is O(1) time.
However, the O(1) time is only for the query of the most tweeted movie. This comes with a space cost of O(n) and the additional O(log n) time each time a new movie is tweeted about to maintain & rearrange the heap.