Search code examples
pythondagster

How can you ensure, that the same pipeline is not executed twice at the same time


Hey :) i have a questions in regards to locking or mutex behavior.

Scenarios:

Lets assume the following scenarios:

  1. The pipeline is working with some local files. These files were placed by CI-CD jobs. After processing i'd like to remove the files. This would result in a race condition if the job takes longer than the schedule interval
  2. two pipelines are very resource heavy and therefore cannot be run in parallel.

possible solutions

  • Currently i would use some kind of Mutex or Lock either in a running service, where pipelines can register and are allowed to be executed or not.
  • duplicate the data to ensure that each workflow can cleanup and use their own data.
  • create a local lock file and ensure that the file will be removed if successful.
  • create a smaller schedule interval and check if lock exist. Exit cleanly if condition is not fulfilled.

I know that this might not be a normal use case for dagster, but i'd also like to use dagster for other workflows such as cleanup tasks and trigger of other pipelines.

Thanks


Solution

  • Thanks for sharing your use case. I don't think Dagster currently supports these features natively. However the 0.10.0 release (a few months out) will include run-level queuing, allowing you place limits on concurrent pipeline runs. Currently it only supports a global limit on runs, but soon will support adding rules based on pipeline tags (e.g. pipelines tagged 'resource-heavy' could be limited to 3 concurrent runs). It seems like that might fit this use case?

    A guide to previewing the current queuing system is here. Also feel free to reach out to me on the Dagster slack at @johann!