Google Interview Question
Member Technical StaffsCountry: United States
In your solution, rehashing the table again and again will result in bad performance...
We do not put in db or in no sql? There are millions of posts and trillions of likes, canyou imagine puytting them in ADT and holding in RAM?
When a user click on an item object with a "like it" buttom
1. An variable associated with this item object is incremented.
2. An link(address) of the item object(or perhaps a picture as well) is copied from the object to a list of "likes" in the "user" object in question.
3. An link to the user is added to item object's list of users who "likes it"
The lists are implemented as arrays and dynamically expand themselves as need be
------------------------update on Jan/23----------
I rethought about the question
1.the thing about "likes" is that it could be any thing from 0 to billions. We probably don't have to store all the links in their complete form, after all ,who will ever exam all the millions of people who clicked on likes. We could probably code the links for storage and decode them only when necessary.
2. The essence of likes is but a combination (u,i) in which u is an user and i is a likeable item. Considering the huge number of likes, and the already accepted delayed speed of web, perhaps instead of storing this information in user object as well as in the "liked" object, we could centralize it and store this combinational relationship in one place, and search for it when the item or the user request information. Of course a great amount of searching will be necessitated, but we can arrange the information by date, and give precedence to more recent data. Judging by the characteristics of Facebook, I don't think many users would EVER request data from...say 5 years ago. So we don't search further than 5 years unless the user explicitly asks
By the way, the consideration that people will only actually look at a small amount of the data and most of the time only the very recent ones is why I had proposed lists instead of hash. The list is long, but you only need to traverse the begin part! the outdated data should be encoded and throw into archive, which should be expansive in size, efficient in storage organization but not necessarily speedy.
Thats the best I could make up.
To Simply fy it :
User 1 -> L1->L2->L3->L4 .....
User 2 -> La->Lb->Lc ...
Object 1 -> User1 -> User 2 -> User 3->...
(Picture/Post etc)
Object 2 -> User1 -> User 2 -> User 3->... etc
They were not happy with this.
@Bertranreddy: there needs to be far more discussion of what the requirements are of this "like" feature before it makes sense to talk about implementation details. Do we have to add stories to users' pages about things they liked? Or are the likes more like just a counter? Is it important to be able to see who liked a particular thing?
If, after examining the requirements carefully, we determine that an array or linked list would be a reasonable in-memory data structure, we still need to talk about how we're going to persist this data (we want to actually save this data to disk, not just store it in RAM), and then maybe also replicate it throughout some sort of distributed system.
Linked list is not space efficient. Also keep in mind that Facebook needs to serialize this data to persistent storage and read from there as needed. They may have had that in mind when asking the question. I wouldn't be surprised if in reality likes are read and displayed without being kept in memory at any time so the discussion of in memory data structures may be irrelevant.
@barcod : You are right that it is irrelevant .Yet although I said "list", I never suggested linked list , I say dynamically allocated arrays( arrays of pointers); there should be a small number of space statically given to each object("user" or "likeable") but the array could be reallocated to a longer array if certain indicators expect a huge number of "likes" for an item or "liked" for an user
All type of likes can not be stored in a single type of data model, so I think facebook does is following :
there are pages or advertisements, where we like them, those type of likes are stored to the user itself, Each user have a list which contain all the pages and advertisements which a user likes, as there can be countable number of pages, but users are finite, so we can store the id's of these pages, or advertisements.
Now what about other likes for example likes for a post(any type of post, text, images, videos etc.), from my thinking likes on these posts are stored with their parent page, along with post liked users id is also stored.
My personal view, this must be implemented by database as some people mentioned above. Each user was stored in a table with an ID and each topic has the same. The User information and topic information would be stored only once. When people click "like", one entry (user ID, topic ID, like or 1)would be inserted to another table. Only this way the system could be scalable. It would be too expensive to store such information in a individual list or hash table.
I really don't understand what's all this inspace thing all about.
Do you really believe that a site such as fb will be storing the relationship between entities totally in memory?
There would be a table POSTS, another USERS and LIKES.
There is a many to many rel between POSTS and USERS so there is a link table POSTS_USERS.
POSTS to LIKES is 1-many so LINKS table would have things like USER_ID, POST_ID and of course ID.
Now memcached enters.
A user logs in, his details are queried from the db and put in cache.
Of course, this is a hashtable.
class Like {
Post parent = null;
HashMap<Integer, UserInfo> hm = new HashMap<Integer, UserInfo>();
public Like(Post post) {
parent = post;
}
public updateLikes(UserInfo info) {
if (hm.contains(info.id)) {
info.likes.remove(this);
hm.remove(info.id);
}
else {
info.likes.add(this);
hm.put(info.id, info);
}
}
public int getLikeCount() {
return hm.size();
}
public ArrayList<UserInfo> getUserList() {
ArrayList<UserInfo> users = new ArrayList<UserInfo>();
Set<Map.Entry<Integer, UserInfo>> userSet = hm.entrySet();
for (Map.Entry<Integer, UserInfo> user : userSet) {
users.add(userSet);
}
return users;
}
}
Use NoSQL solution (MongoDB kind)
1. Have two document stores called posts and the other called liker
2. Posts would have post_id, post_title, post_desc, post_owner, liker as array of objects, comments as array of objects and so on and so forth.
3. Liker would have liker_id, post_id and so on and so forth.
4. Everytime a post is posted, we create a document and store in Posts document store.
5. Everytime a post is liked, we retrieve the post and add the liker in the list
This is a distributed solution so it may not be consistent right away, but is eventual consistency, because Availability is more important.
Likes and posts need not be superfast and consistent all the time.
A linked list is way too inefficient!
- a3 January 20, 2013Every post P can have a hash table of id's of people liking it.
Every user U can have a hash table of id's of posts he liked.
Inserting is O(n) since it's inserting in an ordered hash.
Deleting (unlike button) will be deleting from a hash, which is O(n) as well.
Searching will be O(log n).
Space: for every like you would need to increase both hash tables by 1 in size. So, if k is the size of the entry (the space used in memory for the id, for instance 2bytes), then the space is increased in 2k per like.