Search code examples
kedro

Does Kedro support Checkpointing/Caching of Results?


Let's say we have multiple long running pipeline nodes. It seems quite straight forward to checkpoint or cache the intermediate results, so when nodes after a checkpoint are changed or added only these nodes must be executed again.

Does Kedro provide functionality to make sure, that when I run the pipeline only those steps are executed that have changed? Also the reverse, is there a way to make sure, that all steps that have changed are executed?

Let's say a pipeline producing some intermediate result changed, will it be executed, when i execute a pipeline depending on the output of the first?

TL;DR: Does Kedro have makefile-like tracking of what needs to be done and what not?

I think my question is similar to issue #341, but I do not require support of cyclic graphs.


Solution

  • You might want to have a look at the IncrementalDataSet alongside the partitioned dataset documentation, specifically the section on incremental loads with the incremental dataset which has a notion of "checkpointing", although checkpointing is a manual step and not automated like makefile.