Users send us orders. On receipt of an order we run a pipeline which consists of run several tasks. Some tasks take days of computer time.
# Example pipeline
order -> task1 -> task2 -> task3 -> Complete
I want to start processing each order as it comes in, so our compute cluster will be processing multiple orders and their associated run of the pipeline simultaneously
# Diagram showing each order arriving then being processed over time.
| order1 -> t1 -> t2 -> t3 -> complete
| order2 -> t1 -> t2 -> t3 -> complete
| order3 -> t1 -> t2 -> t3 -> complete
| order4 -> t1 -> t2 -> t3 -> complete
| order5 -> t1 -> t
------------------------------------------------------------------
time
Can you do this in Luigi?
I don't think that you can as when I implement a test pipeline, then try to start another instance of the test pipeline, while the first is running I get the following output from the second pipeline instance.
Pid(s) set([11004]) already running
Process finished with exit code 0
As long as the parameters are different, the Tasks will have different ids and can run simultaneously. If you try to run the Task with the same parameters, such that the id generation function generates the same id, it will see it as already running. If the generated ids are different, you can definitely run more than one of the same task.
Here's the Task ID generation method: https://github.com/spotify/luigi/blob/master/luigi/task.py#L118