Design news aggregator like go

Amazon Interview Question for SDE-2s

0

of 0 votes

8
Answers
Design news aggregator like google news, without using pull, push or page crawl. Explain how are you going to scale it.
- shsf January 25, 2014 in India | Report Duplicate | Flag | PURGE
Amazon SDE-2 System Design

Email me when people comment.

An error occurred in subscribing you.

Country: India

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

How can we get information if we are not allowed to pull or push (page crawl is kind of pull) ??

I can think of only this: catching broadcast information should be aggregated ..

Further lights?

- SK January 26, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

To scale it, we can deploy servers which serve request regionally or according to domain suffix, ex for .in or .com, the requests coming for .in and .com will be served from different servers.
For region wise servers these will be served from region wise cache, each cache holds trending news of particular region for each topic.

user will subscribe number of topics, when first user gets logged in for his/her subscribed topics trending news will be fetched from region wise cache for each topic and will be served.

Now we need to update region wise cache frequently. A cron job will keep on running in background which will fetch news from news feeds , aggregate news from different feeds and update region wise cache.

By breaking system region wise , it is easy to scale since for not all regions load will distributed and each region wise cluster can be scaled easily .

- ashish June 28, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

with some assumptions, like pull model:

class newsaggregator
{
	User user;
	public:
	bool login (string userId, string password);

	//This will hold timers and update result after timer expiry. will fetch from server based on user credentials
	void aggregateResult ();
	void addSource (string sourcename);
	void addCategory (string category, int freq = 5 min);
};

class User
{
	list<Source> sources;
	public:
	bool validateUser (string userId, string password);
	void addSource(string sourcename);
	void addCategory (string category, int freq);
};

class Source
{
	list<string> sources;
	double frequency; // 1 min
	map<string, list<string> > news;

	public:
	void daemon (); // after timer expiry, query each source and update news
	list<string> retrieveNews (string source, string category); // Interface for newsaggregator
	addNewSource (string source); // if source new, then add
};

- SK January 26, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Good design. Few doubts - 1) Why do we need user class. We can have add\remove source method in newsaggregator class. 2) For scalling, how do you handle multiple machines or servers which will aggregate the news and merge them.

- Anonymous January 27, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

it was a very naive or initial thoughts which can be more specific with discussion in interviews. For your answers:

1) Just like in google news, we have authentication. User preferences are saved for future interaction. Just like in google news, whenever we login it aggregates news according to our preferences. In newsaggregator class, we have provided the interface only for external users. This interface will then call user class for saving preferences and call Source class to add new source if it doesn't exist already. As Source class will also update its database (or you should call more technically, Source class will learn from its users) with new inputs which it has not added initially. So newAggregator is kind of single instance per user.

2) For scaling to multiple machines, we can use Distributed hash table (consistent hashing). So whenever there is request to fetch from source S1, then corresponding hashed server will be requested for source S1.

- SK January 27, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

@Saurabh: Any ideas on how to scale this? Say, the number of sources have increased manifold.

Also, in general case, how will a news aggregator work on a push mechanism ever? Is it like each of the news source will "push" their states/news/information to a preconfigured server/location?

- Learner January 27, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

i think it should very much depend on the server itself whether it support to push its info to other who has registered itself.
in case of push model, there must be some kind of event module running on newAggregator which will receive notification from server and update its database.

- SK January 27, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

To scale, how about adding more than more servers (say 50) and having a (sticky?) ELB (elastic load balancer) in front of them, to start with?

- Roxanne January 29, 2014 | Flag

CareerCup

Amazon Interview Question for SDE-2s

Books

Videos

Resume Review

Mock Interviews