Google Interview Question
Software EngineersCountry: United States
Interview Type: Phone Interview
To avoid infinite loop, could track the status (perhaps an enum) of each task in a Map (ex. IN_PROGRESS, RUNNING, etc) instead of just Sets of Tasks. And if we attempt to execute a dependency that is already IN_PROGRESS, throw some error.
Well, I think one way to deal with cycles is to add another set inProcess, if we hit a cycle, i.e., the inProcess set has already had that element, just execute that element first.
public class TaskScheduler {
Set<Task> executed;
Set<Task> allTasks;
Set<Task> inProcess;
public TaskScheduler(Set<Task> tasks){
allTasks = tasks;
executed = new HashSet<Task>();
inProcess = new HashSet<Task>();
}
public void schedule(Set<Task> allTasks){
for(Task t : allTasks){
if(executed.contains(t))
continue;
if(!inProcess.isEmpty() && inProcess.contains(t)){
t.Run();
inProcess.remove(t);
executed.add(t);
continue;
}
inProcess.add(t);
schedule(t.GetDependencies());
t.Run();
inProcess.remove(t);
executed.add(t);
}
}
}
OK, I'm going to write this in Python for simplicity with this task interface:
class Task(object):
def run():
pass
def get_dependencies():
pass
So each task has a run method that does something and a get_dependencies method that gives us a list of tasks which need to be run before this task. We're given a list of tasks and asked to run each task only once but respecting the dependencies of each task.
The initial thing to try would be to simply loop through our tasks, run all the dependent tasks for each task, and then run that task. However, we don't want to run any task more than once, so we need to keep track of which tasks we've already run and not run them (or their dependencies) again. We'll use a set (hashmap) to keep track of this - if the task is already in the set, we've run it and we won't need to worry about it.
def run_tasks(tasks):
visited = set()
def sub_run(tasks):
for task in tasks:
if task in visited:
continue
sub_run(task.get_dependencies())
task.run()
visited.add(task)
sub_run(tasks)
This approach works pretty well - in fact, we're only going through each task once, and through each dependency list once (the first time we see it), so the runtime is O(V+E), where V is the number of tasks and E the number of dependencies. However, it's got a big problem. This function works fine as long as the input is good - but if we give it a circular dependency, it'll loop forever rather than failing gracefully. Even if we caught the loop later, we may have already run a number of tasks before hitting the cycle, and we probably want to fail first, before we do anything. So how to solve this?
Well, this is a standard graph problem on directed, acyclic graphs - it's called topological sorting. Before we run any of the tasks, we'll come up with a topological sort of the tasks, or an ordering of the tasks that respects all the dependencies. If we can't come up with a valid ordering, that means there's a cycle in the graph and we'll fail gracefully there. Then once we have the ordering, we can simply run them all in that order.
Here's the method. We'll need several supplementary data structures. First, we'll make a dictionary mapping tasks to their order in the input list so we can quickly look them up. Then we'll make a list that stores a count of how many dependencies (or incoming edges, in graph-speak) each task has. We'll also create a list of lists where each list stores all the tasks that are depending on each task (or an adjacency list, again in graph-speak). Finally, any tasks with no dependencies will get added to a ready list, and we'll initialize an empty ordering list that we'll fill up as we go.
Then we do this: first, pick a ready task from the ready list. Put it in the ordering list as the next task in the order, and then go through each task that is depending on this task and reduce its dependency count by 1. If this reduces it to zero, then that task is ready and we push it into the ready list. Repeat until the ready list is empty. If our final ordering list contains all of the tasks, then we've got a valid order and we'll just go through and run all the tasks in that order. If it doesn't, then the remaining tasks must have a cycle, and we can raise an exception or however we want to deal with the problem. Here's the code:
def run_tasks(tasks):
#Setup
ids = {}
for i, task in enumerate(tasks):
ids[task] = i
incoming = []
adj_list = [[] for task in tasks]
ready = []
ordering = []
for task in tasks:
deps = task.get_dependencies()
incoming.append(len(deps))
for dep in deps:
adj_list[ids[dep]].append(ids[task])
if len(deps) == 0:
ready.append(ids[task])
#Do the sort
while ready:
next_task = ready.pop()
ordering.append(next_task)
for k in adj_list[next_task]:
incoming[k] -= 1
if incoming[k] == 0:
ready.append(k)
#Run the tasks
if len(ordering) < len(tasks):
print 'Cycle Detected!'
else:
for k in ordering:
tasks[k].run()
The other big benefit of this approach is that depending on how we implement the "ready" data structure, we can control *which* valid task ordering we pick. We might prefer to have certain tasks finished earlier, while other tasks might not be as important. If we implement the "ready" structure with a priority queue, we can guarantee that the top priority task will get run as soon as possible.
Answers provided work pretty good but they end up with possible O(n!), imagine a list where each task depends on another task except one, each go for the array we will just be taking n down by 1, so you are looking at possibly (n)(n-1)(n-2) etc.
Also what if there is no solution because all tasks depend on each other, the provided solutions would enter an infinite recursion.
My idea is to sort your ask based on dependence and just go through them one by one, would also need to make sure no task is both below and above you in sort order at anytime.
That would work with O(n log(n))
The solution, any solution, to this problem will only work on a DAG. It is true that the current answers don't take into account that the input could contain cycles and would thus fail horribly on such inputs.
The running time is O(V + E), which for unfortunate choices of sets could lead to weak performance. But I fail to see the O(n!) runtime complexity, please follow up by giving an example set.
This set:
{t5, t4, t3, t2, t1}
t5 -> {}
t4 -> {t5}
t3 -> {t4, t5}
t2 -> {t3, t4, t5}
t1 -> {t2, t3, t4, t5}
Will consider all edges of all vertices, but it will still execute each vertex and it's dependencies only once. Even if we only consider the number of visits it shouldn't even come close to O(n!).
Transformed the Java interface into a C++ abstract class.
class Task {
public:
virtual void run() = 0;
virtual vector<Task *> & getDependencies() = 0;
};
The implementation is quite easy, however take note of the first rule. A task could contain dependencies that are also a dependency of another task. Mark each task as you traverse through the tasks, so you won't execute a task twice.
class TaskScheduler {
using vec_task_t = vector<Task *>; // Input
using mark_set_t = unordered_set<Task *>; // Hash set of task
mark_set_t marked_tasks;
public:
void runTasks(vec_task_t & tasks) {
for (Task * t : tasks) {
if (marked_tasks.find(t) == marked_tasks.end()) {
runTasks(t->getDependencies());
t->run();
marked_tasks.insert(t);
}
}
}
};
public class TaskScheduler {
Set<Task> tasks = new HashSet<>();
Set<Task> visitedTasks = new HashSet<>();
public void TaskScehduler(Set<Task> tasks) {
this.tasks = tasks;
}
public void ExecuteTasks(Set<Task> tasks) {
for( Task task : tasks) {
if (!visitedTasks.contains(task)) {
Set<Task> subtasks = task.GetDependencies();
if (subtasks.size() > 0)
ExecuteTasks(subtasks);
task.Run();
visitedTasks.add(task);
}
}
}
}
Here's some full code in Java (building a bit off what others wrote before). Please post below if you have any comments on the implementation, if you think something is inefficient, unnecessary, could be done another way, etc. I chose to use a private static executeTasks method that does the bulk of the work. The code throws an Exception if circular dependencies are detected. Perhaps in the real world, it should try to detect circular dependencies first before executing any of the tasks.
import java.util.Set;
interface Task {
void Run();
void AddDependency(Task task);
Set<Task> GetDependencies();
}
import java.util.Set;
import java.util.HashSet;
public class SimpleTask implements Task {
private String str;
private Set<Task> dependencies;
public SimpleTask(String str) {
this.str = str;
this.dependencies = new HashSet<Task>();
}
public void Run() {
System.out.println(str);
}
public void AddDependency(Task task) {
dependencies.add(task);
}
public Set<Task> GetDependencies() {
return dependencies;
}
}
import java.util.Set;
import java.util.HashSet;
import java.util.HashMap;
enum TaskStatus {
EXECUTING_DEPENDENCIES, EXECUTED
}
public class TaskScheduler {
private Set<Task> tasks;
private HashMap<Task, TaskStatus> status;
public TaskScheduler() {
this.tasks = new HashSet<Task>();
this.status = new HashMap<Task, TaskStatus>();
}
public void addTask(Task task) {
tasks.add(task);
}
public void executeTasks() {
try {
executeTasks(tasks, status);
} catch (Exception e) {
System.out.println("Execution of tasks incomplete: " + e.getMessage());
}
}
private static void executeTasks(Set<Task> tasks, HashMap<Task, TaskStatus> status) throws Exception {
for (Task task : tasks) {
if (!status.containsKey(task)) {
status.put(task, TaskStatus.EXECUTING_DEPENDENCIES);
executeTasks(task.GetDependencies(), status);
task.Run();
status.put(task, TaskStatus.EXECUTED);
} else if (status.containsKey(task)
&& status.get(task) == TaskStatus.EXECUTING_DEPENDENCIES) {
throw new Exception("Circular dependencies detected!");
}
}
}
public static void main(String[] args) {
Task t1 = new SimpleTask("one");
Task t2 = new SimpleTask("two");
Task t3 = new SimpleTask("three");
Task t4 = new SimpleTask("four");
Task t5 = new SimpleTask("five");
t1.AddDependency(t2);
t1.AddDependency(t5);
t2.AddDependency(t4);
t3.AddDependency(t2);
t3.AddDependency(t4);
System.out.println("Execute tasks (no circular dependencies):");
TaskScheduler scheduler = new TaskScheduler();
scheduler.addTask(t1);
scheduler.addTask(t2);
scheduler.addTask(t3);
scheduler.addTask(t4);
scheduler.addTask(t5);
scheduler.executeTasks();
System.out.println();
System.out.println("Execute tasks (circular dependencies):");
t2.AddDependency(t1);
TaskScheduler scheduler2 = new TaskScheduler();
scheduler2.addTask(t1);
scheduler2.addTask(t2);
scheduler2.addTask(t3);
scheduler2.addTask(t4);
scheduler2.addTask(t5);
scheduler2.executeTasks();
System.out.println();
}
}
Here's some full code in Java (building a bit off what others wrote before). Please post below if you have any comments on the implementation, if you think something is inefficient, unnecessary, could be done another way, etc. I chose to use a private static executeTasks method that does the bulk of the work. The code throws an Exception if circular dependencies are detected. Perhaps in the real world, it should try to detect circular dependencies first before executing any of the tasks.
import java.util.Set;
interface Task {
void Run();
void AddDependency(Task task);
Set<Task> GetDependencies();
}
import java.util.Set;
import java.util.HashSet;
public class SimpleTask implements Task {
private String str;
private Set<Task> dependencies;
public SimpleTask(String str) {
this.str = str;
this.dependencies = new HashSet<Task>();
}
public void Run() {
System.out.println(str);
}
public void AddDependency(Task task) {
dependencies.add(task);
}
public Set<Task> GetDependencies() {
return dependencies;
}
}
import java.util.Set;
import java.util.HashSet;
import java.util.HashMap;
enum TaskStatus {
EXECUTING_DEPENDENCIES, EXECUTED
}
public class TaskScheduler {
private Set<Task> tasks;
private HashMap<Task, TaskStatus> status;
public TaskScheduler() {
this.tasks = new HashSet<Task>();
this.status = new HashMap<Task, TaskStatus>();
}
public void addTask(Task task) {
tasks.add(task);
}
public void executeTasks() {
try {
executeTasks(tasks, status);
} catch (Exception e) {
System.out.println("Execution of tasks incomplete: " + e.getMessage());
}
}
private static void executeTasks(Set<Task> tasks, HashMap<Task, TaskStatus> status) throws Exception {
for (Task task : tasks) {
if (!status.containsKey(task)) {
status.put(task, TaskStatus.EXECUTING_DEPENDENCIES);
executeTasks(task.GetDependencies(), status);
task.Run();
status.put(task, TaskStatus.EXECUTED);
} else if (status.containsKey(task)
&& status.get(task) == TaskStatus.EXECUTING_DEPENDENCIES) {
throw new Exception("Circular dependencies detected!");
}
}
}
public static void main(String[] args) {
Task t1 = new SimpleTask("one");
Task t2 = new SimpleTask("two");
Task t3 = new SimpleTask("three");
Task t4 = new SimpleTask("four");
Task t5 = new SimpleTask("five");
t1.AddDependency(t2);
t1.AddDependency(t5);
t2.AddDependency(t4);
t3.AddDependency(t2);
t3.AddDependency(t4);
System.out.println("Execute tasks (no circular dependencies):");
TaskScheduler scheduler = new TaskScheduler();
scheduler.addTask(t1);
scheduler.addTask(t2);
scheduler.addTask(t3);
scheduler.addTask(t4);
scheduler.addTask(t5);
scheduler.executeTasks();
System.out.println();
System.out.println("Execute tasks (circular dependencies):");
t2.AddDependency(t1);
TaskScheduler scheduler2 = new TaskScheduler();
scheduler2.addTask(t1);
scheduler2.addTask(t2);
scheduler2.addTask(t3);
scheduler2.addTask(t4);
scheduler2.addTask(t5);
scheduler2.executeTasks();
System.out.println();
}
}
Here's some full code in Java (building a bit off what others wrote before). Please post below if you have any comments on the implementation, if you think something is inefficient, unnecessary, could be done another way, etc. I chose to use a private static executeTasks method that does the bulk of the work. The code throws an Exception if circular dependencies are detected. Perhaps in the real world, it should try to detect circular dependencies first before executing any of the tasks.
import java.util.Set;
interface Task {
void Run();
void AddDependency(Task task);
Set<Task> GetDependencies();
}
import java.util.Set;
import java.util.HashSet;
public class SimpleTask implements Task {
private String str;
private Set<Task> dependencies;
public SimpleTask(String str) {
this.str = str;
this.dependencies = new HashSet<Task>();
}
public void Run() {
System.out.println(str);
}
public void AddDependency(Task task) {
dependencies.add(task);
}
public Set<Task> GetDependencies() {
return dependencies;
}
}
import java.util.Set;
import java.util.HashSet;
import java.util.HashMap;
enum TaskStatus {
EXECUTING_DEPENDENCIES, EXECUTED
}
public class TaskScheduler {
private Set<Task> tasks;
private HashMap<Task, TaskStatus> status;
public TaskScheduler() {
this.tasks = new HashSet<Task>();
this.status = new HashMap<Task, TaskStatus>();
}
public void addTask(Task task) {
tasks.add(task);
}
public void executeTasks() {
try {
executeTasks(tasks, status);
} catch (Exception e) {
System.out.println("Execution of tasks incomplete: " + e.getMessage());
}
}
private static void executeTasks(Set<Task> tasks, HashMap<Task, TaskStatus> status) throws Exception {
for (Task task : tasks) {
if (!status.containsKey(task)) {
status.put(task, TaskStatus.EXECUTING_DEPENDENCIES);
executeTasks(task.GetDependencies(), status);
task.Run();
status.put(task, TaskStatus.EXECUTED);
} else if (status.containsKey(task)
&& status.get(task) == TaskStatus.EXECUTING_DEPENDENCIES) {
throw new Exception("Circular dependencies detected!");
}
}
}
public static void main(String[] args) {
Task t1 = new SimpleTask("one");
Task t2 = new SimpleTask("two");
Task t3 = new SimpleTask("three");
Task t4 = new SimpleTask("four");
Task t5 = new SimpleTask("five");
t1.AddDependency(t2);
t1.AddDependency(t5);
t2.AddDependency(t4);
t3.AddDependency(t2);
t3.AddDependency(t4);
System.out.println("Execute tasks (no circular dependencies):");
TaskScheduler scheduler = new TaskScheduler();
scheduler.addTask(t1);
scheduler.addTask(t2);
scheduler.addTask(t3);
scheduler.addTask(t4);
scheduler.addTask(t5);
scheduler.executeTasks();
System.out.println();
System.out.println("Execute tasks (circular dependencies):");
t2.AddDependency(t1);
TaskScheduler scheduler2 = new TaskScheduler();
scheduler2.addTask(t1);
scheduler2.addTask(t2);
scheduler2.addTask(t3);
scheduler2.addTask(t4);
scheduler2.addTask(t5);
scheduler2.executeTasks();
System.out.println();
}
}
import java.util.Set;
interface Task {
void Run();
void AddDependency(Task task);
Set<Task> GetDependencies();
}
import java.util.Set;
import java.util.HashSet;
public class SimpleTask implements Task {
private String str;
private Set<Task> dependencies;
public SimpleTask(String str) {
this.str = str;
this.dependencies = new HashSet<Task>();
}
public void Run() {
System.out.println(str);
}
public void AddDependency(Task task) {
dependencies.add(task);
}
public Set<Task> GetDependencies() {
return dependencies;
}
}
import java.util.Set;
import java.util.HashSet;
import java.util.HashMap;
enum TaskStatus {
EXECUTING_DEPENDENCIES, EXECUTED
}
public class TaskScheduler {
private Set<Task> tasks;
private HashMap<Task, TaskStatus> status;
public TaskScheduler() {
this.tasks = new HashSet<Task>();
this.status = new HashMap<Task, TaskStatus>();
}
public void addTask(Task task) {
tasks.add(task);
}
public void executeTasks() {
try {
executeTasks(tasks, status);
} catch (Exception e) {
System.out.println("Execution of tasks incomplete: " + e.getMessage());
}
}
private static void executeTasks(Set<Task> tasks, HashMap<Task, TaskStatus> status) throws Exception {
for (Task task : tasks) {
if (!status.containsKey(task)) {
status.put(task, TaskStatus.EXECUTING_DEPENDENCIES);
executeTasks(task.GetDependencies(), status);
task.Run();
status.put(task, TaskStatus.EXECUTED);
} else if (status.containsKey(task)
&& status.get(task) == TaskStatus.EXECUTING_DEPENDENCIES) {
throw new Exception("Circular dependencies detected!");
}
}
}
public static void main(String[] args) {
Task t1 = new SimpleTask("one");
Task t2 = new SimpleTask("two");
Task t3 = new SimpleTask("three");
Task t4 = new SimpleTask("four");
Task t5 = new SimpleTask("five");
t1.AddDependency(t2);
t1.AddDependency(t5);
t2.AddDependency(t4);
t3.AddDependency(t2);
t3.AddDependency(t4);
System.out.println("Execute tasks (no circular dependencies):");
TaskScheduler scheduler = new TaskScheduler();
scheduler.addTask(t1);
scheduler.addTask(t2);
scheduler.addTask(t3);
scheduler.addTask(t4);
scheduler.addTask(t5);
scheduler.executeTasks();
System.out.println();
System.out.println("Execute tasks (circular dependencies):");
t2.AddDependency(t1);
TaskScheduler scheduler2 = new TaskScheduler();
scheduler2.addTask(t1);
scheduler2.addTask(t2);
scheduler2.addTask(t3);
scheduler2.addTask(t4);
scheduler2.addTask(t5);
scheduler2.executeTasks();
System.out.println();
}
}
1. This is similar to post-order tree traversal where we execute all dependent children and then execute the parent
2. A stack is maintained to store the parent and its children(dependencies)
3. Initialize stack with root task
4. while(stack is not empty) {
node = stack.pop();
if( node is leaf or its children are visited)
execute node;
else if(node’s children are not visited) {
childrenVisited[node] = true;
stack.push(node); // push the parent back as we need to execute children first
foreach (child of node) {
if(child’s children are not visited) { // this is to break cycle
stack.push(child);
} //foreach
} //if
} //while
Minor modification to the above solution:
step 3: initialize stack by pushing all elements, instead of just root
Iterative implementation of Siddharth's solution using stack:
private void executeTasksIterative() throws Exception {
System.out.println("Iterative execution");
for(Task taskNode: tasks) { // To make sure all nodes are executed
// Initialize stack
if(status.get(taskNode) == TaskStatus.EXECUTED) {
continue; // task and its dependencies are already executed
}
stack.push(taskNode);
while(!stack.empty()) {
Task task = stack.pop();
if( !status.containsKey(task)) { // children are not yet explored
status.put(task, TaskStatus.EXECUTING_DEPENDENCIES);
stack.push(task);
for(Task child: task.GetDependencies() ) {
if(status.get(child) == TaskStatus.EXECUTING_DEPENDENCIES) {
throw new Exception("Circulary dependency detected");
} else if (status.get(child) != TaskStatus.EXECUTED && child.GetDependencies().size() == 0) {
// a child qualifies to run only when its not executed and doesn't have any dependent child
child.Run();
status.put(child, TaskStatus.EXECUTED);
} else if(status.get(child) != TaskStatus.EXECUTED) { // make sure we push only task which is not executed yet
stack.push(child);
}
}
} else if(status.get(task) != TaskStatus.EXECUTED){
// all children are executed, execute the parent
task.Run();
status.put(task, TaskStatus.EXECUTED);
}
}//while
}// for
}
interface Task {
void run();
Set<Task> getDependencies();
}
class SimpleTask extends Thread implements Task{
Set<Task> set;
SimpleTask(){
start();
}
@Override
public void run() {
while(true){
synchronized(this){
try{
wait();
}catch(Exception e){
e.printStackTrace();
}
}
}
}
void addDependencies(Task t){
if(set == null){
set = new HashSet<Task>();
}
set.add(t);
}
@Override
public Set<Task> getDependencies() {
return set;
}
}
class TaskScheduler{
public static void main(String[] args){
Task task = new SimpleTask();
Set<Task> tasks = task.getDependencies();
while(true){
for(Task t : tasks){
synchronized(t){
t.notify();
}
try{
Thread.sleep(37);
}catch(Exception e){
e.printStackTrace();
}
}
}
}
}
Loop through all tasks using a depth-first style of execution. Tasks with no dependencies are run first, and added to the list of already executed tasks (so they don't get executed again).
- hylan039 March 07, 2015