Design a scheduling system whi

Amazon Interview Question for Principal Software Engineers

0

of 0 votes

5
Answers
Design a scheduling system which can run 10M jobs. Jobs can be scheduled to run on a time of day, some day of the week, a particular day of the month etc.
- maneesh.chaturvedi February 20, 2013 in India | Report Duplicate | Flag | PURGE
Amazon Principal Software Engineer

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: Phone Interview

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

Very high-level answer with no code. But I suspect they're looking for something like this:

Let's assume we have access to a distributed, NoSQL store, for which we can configure CAP. We will configure it to be strongly consistent (C-P), to enable distributed locking. Our system will have two kinds of nodes: controller nodes and worker nodes. Controller nodes should be distributed across data centers and control a set worker nodes on their local network (i.e. same data center and ideally same rack).

Controller nodes will expose an API that permits:

- Uploading of a new script (or modification to an existing script) to be used in a job or jobs. You might support multiple scripting languages, in which case that would also be a parameter to creation. The script is placed in our strongly consistent store and a unique ID is returned.
- Creation of a new job, with scheduling parameters and the id of the script to be run. A record is created in our strongly consistent store and a unique ID is returned.
- Reporting (from a subordinate worker node) on a completed job. This will include information about performance characteristics like CPU time and memory utilization.
- Registration of a new, subordinate worker node

Worker nodes will expose an API that permits:

- Submission of a job to be executed by that worker node immediately
- Querying the worker node for its estimated, available compute resources (CPU, memory, etc.)

Controller nodes will host a daemon process which queries its known subordinate worker nodes every minute for their available compute resources. It then queries our C-P job store for jobs that need to run in the next minute, are not locked by another controller, and have historical compute characteristics that match subordinate worker node availability. It gets back some set of results which it then attempts to lock. Assuming it can successfully lock some, these are then submitted to worker nodes with available resources to match the job's requirements. Worker node runs the job, then reports success (and resource consumption) to its controller, which then adjusts the job's compute requirements (according to some algorithm TBD)
accordingly, then releases its lock on the job.

I'm glossing over quite a few things here, most notably heartbeat. Sometimes controllers will go
down before releasing a lock. Sometimes workers will go down before finishing a job. Your C-P
store will probably need a locking host IP and time-of-lock, so that another controller can call
shenanigans and try to ping the locking controller if it thinks something is amiss. That controller,
if online, can then try to ping its worker.

- Jason D. February 22, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

max heap will do the job.
Suppose set of processes {T1,T2,T3,,,Tn}
As the process comes Ti just terminate the running process and insert terminated process in to themax heap with priority 0. Now the Ti will run for perticular time slice 'slice' defined.after that take max prioity processs from the max heap and run that.
And one more before inserting any element just increase the proitority of each element of the max heap by one.

- geeks February 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How do you propose handle 10 million jobs. The intent is more on parallelizing and distributing the jobs over n nodes so that the system scales.

- maneesh.chaturvedi February 20, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

@Maneesh
How many years of experience do you have.?

Is problem level depends on years of experience.??
I know, It not a right way to post comment. Please redirect me to right path.

Thank you..

- Anonymous February 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

The job has the following attributes:
- start time(dd-mm-yyyy hh:mm:ss)
- occurrence rule: once, daily, monthly, weekly, etc.
- the pointer to the job working process. For example it’s command string consists of the program name and arguments.
The jobs list has 10M entities..seems to be too large to keep in the memory all the time, so save it in DB or in distributed DB.

Once per day the agent performs the following:
going through all the jobs in DB
select the jobs which should start before the next day. The next job’s start time can be calculated by using the current time, job’s start and time and occurrence rule.
for each found job:
start the new thread that sleeps until the next job start time and then run the job and terminates the thread.

The agent should intercept the new jobs insertion and if it should be started before the next day then start the thread for it.

The agent should intercept the jobs deleting and cancel the thread for the corresponding job it it was started..

There is system limit for the threads that an app may run. if there are too many jobs to start today the agent may partition then in the X hrs long intervals and start the new threads several times per day..

- S.Abakumoff February 21, 2013 | Flag Reply

CareerCup

Amazon Interview Question for Principal Software Engineers

Books

Videos

Resume Review

Mock Interviews