We are defining multiple components in kubeflow pipelines using @dsl.containerop
.
There are two steps involved in the requirement.
First we need to run a download task, which takes an input url
and download the file inside the container.
We need to use the file which got generated in the first step and run a python program -this will be done in the seconds containerop.
Sample code is as below.
@dsl.component
def download(url: str, output_file: OutputPath(str)):
return dsl.ContainerOp(
name='Download',
image='busybox:latest',
command=["sh", "-c"],
arguments=["wget %s " % url, output_file)],
)
And the above mentioned code will be invoked using
download_task = download(url=<URL>")
As per the component spec https://www.kubeflow.org/docs/components/pipelines/reference/component-spec/ - the output path doesnt needs to be mentioned.
https://github.com/kubeflow/pipelines/blob/d106a6533bf4e1cbda4364560bc7526cb67d4eb2/samples/tutorials/Data%20passing%20in%20python%20components/Data%20passing%20in%20python%20components%20-%20Files.py#L69 - @func_to_container_op
- We could see a way to get the output using OutputPath type.
Is there any way to achieve this in dsl.containerop
. We dont want to hardcode the output path using file_outputs
.
You cannot do this in ContainerOp, that was one of the reasons ContainerOp has been deprecated, refer to https://github.com/kubeflow/pipelines/pull/4166.
Suggestions:
kfp.components.load_component_from_text
method refer to this example pipeline.