Search code examples
pythonpipelinekedro

How to run a pipeline except for a few nodes?


I want to run a pipeline for different files, but some of them don't need all of the defined nodes. How can I pass them?


Solution

  • To filter out a few lines of a pipeline you can simply filter the pipeline list from inside of python, my favorite way is to use a list comprehension.

    by name

    nodes_to_run = [node for node in pipeline.nodes if 'dont_run_me' not in node.name]
    run(nodes_to_run, io)
    

    by tag

    nodes_to_run = [node for node in pipeline.nodes if 'dont_run_tag' not in node.tags]
    run(nodes_to_run, io)
    

    It's possible to filter by any attribute tied to the pipeline node, (name, inputs, outputs, short_name, tags)

    If you need to run your pipeline this way in production or from the command line, you can either tag your pipeline to run with tags, or add a custom click.option to your run function inside of kedro_cli.py then run this filter when the flag is True.

    Note This assumes that you have your pipeline loaded into memory as pipeline and catalog loaded in as io