Pega Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: In-Person
I'm assuming there is no ordering of tasks. To ensure high throughput, you need a queue to hold pending tasks. Every VM runs an application to poll the queue for work, executes it, reports the result in some fashion and loops forever. Of course, you would need someone to enqueue the tasks in the queue. You would also want to invent some concept of batching identical tasks together so that resource utilization is maximized. The good thing about a pool of executor is that you can scale up/scale down with the load, very easily.
If the tasks have priority, the simplest solution is to use multiple queues, one for each priority band. The task executors look at the highest priority queue first, then the second and so on. There is a risk of starvation though.
The queue is interesting. It is required because there is no guarantee how long a particular task would take. Immediately, there's an impedence mismatch in the producers and the consumers. There would definitely be a situation where the producer generates tasks faster than the consumers can eat combined. The queue needs to be fault tolerant as well and needs to have a defined delivery guarantee. Typically such applications would use queue implementations such as Amazon SQS. SQS guarantees at least once delivery, which means a task can be requested to be done twice. The executors need to be able to ensure this does not happen (look up idempotency in distributed systems)
eugene.yarovoi:
It actually, doesnt matter if the VMs reside on the same host or not. Even if they did, you have a higher probability of a failure (a host going down meaning more than one VMs are out of service). You need to design for reliability anyway.
"You would also want to invent some concept of batching identical tasks together so that resource utilization is maximized. "
Define "identical tasks". What exactly do you mean here?
"The queue is interesting. It is required because there is no guarantee how long a particular task would take."
Not sure I'm following your train of thought there. It's certainly not required, though maybe desired. What sort of situation are you contrasting this with? One where you can try to partition the tasks into roughly equal sets because they have known durations?
"It actually, doesnt matter if the VMs reside on the same host or not."
I'm not so sure about that one. If the JVMs are all sitting on one machine that has no multiprocessing or anything like that, it might be wiser for it to not waste memory by loading the same classes in many different JVMs. I'm not saying this is a good strategy all the time, but I could see certain situations where running everything on one JVM might give the best performance. So the answer to such a question is not completely irrelevant. I agree, however, that the interviewer was probably looking for some sort of distribution strategy and was probably thinking of these JVMs being on multiple machines or at least on a machine with multiprocessing capabilities.
- These tasks can be executed in any order.
- Each JVM reside on a separate machine.
- The interviewer also had a follow-up question : At some point, if I want to abort this job, how it can be achieved?
JMX is your answer. You will create JMX agents in all servers, each agent to carry out those tasks and return status once complete. It has another function of cancelling the task. So each JVM node is configured with this agent as well as with other information as in how many worker threads each one should have etc. Then you create a central JVM with JMX manager to which all these agents report to. Manager initiates all agents and is reponsible for allocation of task to agents so that all jvms are equally occupied. Manager can also provide a function to cancel task which allows it to communicate with each other agent in cacelling and stopping execution of task.
What sorts of tasks? Can these tasks be done in any order? Does each JVM reside on a separate machine, or in such an environment that it gets its own resources?
- eugene.yarovoi October 02, 2012