I want to run a pipeline for different files, but some of them don't need all of the defined nodes. How can I pass them?
To filter out a few lines of a pipeline you can simply filter the pipeline list from inside of python, my favorite way is to use a list comprehension.
by name
nodes_to_run = [node for node in pipeline.nodes if 'dont_run_me' not in node.name]
run(nodes_to_run, io)
by tag
nodes_to_run = [node for node in pipeline.nodes if 'dont_run_tag' not in node.tags]
run(nodes_to_run, io)
It's possible to filter by any attribute tied to the pipeline node, (name, inputs, outputs, short_name, tags)
If you need to run your pipeline this way in production or from the command line, you can either tag your pipeline to run with tags, or add a custom click.option
to your run
function inside of kedro_cli.py
then run this filter when the flag is True
.
Note
This assumes that you have your pipeline loaded into memory as pipeline
and catalog loaded in as io