Search code examples
kubeflowkubeflow-pipelines

How to use OutputPath across multiple components in kubeflow


We are defining multiple components in kubeflow pipelines using @dsl.containerop.

There are two steps involved in the requirement.

  1. First we need to run a download task, which takes an input url and download the file inside the container.

  2. We need to use the file which got generated in the first step and run a python program -this will be done in the seconds containerop.

Sample code is as below.

@dsl.component
    def download(url: str, output_file: OutputPath(str)):
        return dsl.ContainerOp(
            name='Download',
            image='busybox:latest',
            command=["sh", "-c"],
            arguments=["wget %s " % url, output_file)],
        )

And the above mentioned code will be invoked using

download_task = download(url=<URL>")

As per the component spec https://www.kubeflow.org/docs/components/pipelines/reference/component-spec/ - the output path doesnt needs to be mentioned.

https://github.com/kubeflow/pipelines/blob/d106a6533bf4e1cbda4364560bc7526cb67d4eb2/samples/tutorials/Data%20passing%20in%20python%20components/Data%20passing%20in%20python%20components%20-%20Files.py#L69 - @func_to_container_op - We could see a way to get the output using OutputPath type.

Is there any way to achieve this in dsl.containerop. We dont want to hardcode the output path using file_outputs.


Solution

  • You cannot do this in ContainerOp, that was one of the reasons ContainerOp has been deprecated, refer to https://github.com/kubeflow/pipelines/pull/4166.

    Suggestions:

    1. Following https://www.kubeflow.org/docs/components/pipelines/reference/component-spec/ to build your reusable component yaml.
    2. if you prefer inlining component yaml for one-off components, you can load it via kfp.components.load_component_from_text method refer to this example pipeline.