Search code examples
pythonmachine-learningkedro

How to run the nodes in sequence as declared in kedro pipeline?


In Kedro pipeline, nodes (something like python functions) are declared sequentially. In some cases, the input of one node is the output of the previous node. However, sometimes, when kedro run API is called in the commandline, the nodes are not run sequentially.

In kedro documentation, it says that by default the nodes are ran in sequence.

My run.py code:

def main(
tags: Iterable[str] = None,
env: str = None,
runner: Type[AbstractRunner] = None,
node_names: Iterable[str] = None,
from_nodes: Iterable[str] = None,
to_nodes: Iterable[str] = None,
from_inputs: Iterable[str] = None,
):

project_context = ProjectContext(Path.cwd(), env=env)
project_context.run(
    tags=tags,
    runner=runner,
    node_names=node_names,
    from_nodes=from_nodes,
    to_nodes=to_nodes,
    from_inputs=from_inputs,
)

Currently my last node is sometimes ran before my first few nodes.


Solution

  • The answer that I recieved from Kedro github:

    Pipeline determines the node execution order exclusively based on dataset dependencies (node inputs and outputs) at the moment. So the only option to dictate that the node A should run before node B is to put a dummy dataset as an output of node A and an input of node B.