Search code examples
pythongcloudgoogle-cloud-vertex-aikubeflowkubeflow-pipelines

Same component works when defined through @component but it fails when created with create_component_from_func


I have a Docker container in gcloud with all the code I need (multiple files, etc.). I'm able to run it when I defined the component using the @component decorator. I define the function and I set up the base_image. The component does a few things and it loads code from the container as expected.

However, It fails when I create the component through a function. First, I create the component with the create_component_from_func function (same function, and I define the same container). Then I use it.

Then I create a pipeline with those 2 components (in the example they are just 2 components disconnected). I would expect the same result from both components, but the 2nd fails. It cannot import the functions. I did some prints and checks (even reading the code itself) and it is there. Everything looks exactly the same. I thought those were analogous approaches, and I guess they are not.

Any idea of the differences and why it works with @component but not for create_component_from_func

This is obviously not the code (I would need to provide a container, etc.) but you can get an idea of what I'm doing:

from kfp.components import create_component_from_func
from kfp.v2.dsl component

def add_values_from_func(a:int, b:int) -> int:
    # I can do stuff and print
    from container_file import transformer_function
    return transformer_function(a) + transformer_function(b)

@component(base_image="my_docker_image_in_gcloud")
def add_values(a:int, b:int) -> int:
    # I can do stuff and print
    from container_file import transformer_function
    return transformer_function(a) + transformer_function(b)


# PIPELINE CODE
### This component works
add_values_task = (
    add_values(a=1, b=2)
)

### This component errors out saying that it cannot find the transformer_function
add_values_from_func_component = create_component_from_func(
    func=add_values_from_func,
    base_image="my_docker_image_in_gcloud",
)
add_values_from_func_component_task = (
    add_values_from_func_component(a=1, b=2)
)

Solution

  • For some reason, the PYTHONPATH is different when you define a container through @component or through create_component_from_func.

    The WORKDIR is not included for the second case.

    So there are some solutions:

    • Adding ENV PYTHONPATH "${PYTHONPATH}:${WORKDIR}" to the Dockerfile
    • Adding it at the beginning of the component: sys.path.append(os.getcwd())