I'm testing a pipeline with Luigi and I've noticed strange caching behavior in the task visualizer. For one thing, tasks seem to stay in the cache for a set time, sometimes overlapping with tasks from a second run of the pipeline, causing clutter in the UI. I've also noticed that when two pipelines are run in succession it takes a while for tasks from the new pipeline to appear. Is there a way to manually reset the cache before each run? Is there a configuration variable that sets how long tasks are cached before they expire?
You can use the remove_delay setting for the scheduler. In your config file:
[scheduler]
remove_delay = 10
This applies to the scheduler so you need to restart luigid to enable it.
From the doc:
Number of seconds to wait before removing a task that has no stakeholders. Defaults to 600 (10 minutes).
From experience, stakeholders in that case seem to mean workers and upstream/downstream dependencies.