Search code examples
dockergoogle-cloud-vertex-aikubeflow-pipelinesvertex-ai-pipeline

Vertex AI docker in docker: Cannot connect to the Docker daemon


So I am using kubeflow Containerized Python Components to create a Vertex AI training pipeline. As the last step of the pipeline, I would like to build and push a custom prediction container image using a Custom Prediction Routine, with the newly trained model baked in.

In the Dockerfile that creates the component that handles this, I install and initialize Docker like so:

RUN curl -fsSL https://get.docker.com | sh

RUN apt-get update && \
    apt-get install -y docker-ce && \
    systemctl enable docker

Then in the component Python code I have:

from google.cloud.aiplatform.prediction import LocalModel
    
local_model = LocalModel.build_cpr_model(
    str(PREDICT_CONTAINER_DIR),
    "my-docker-image",
    base_image="python:3.11",
    predictor=Predictor,
    requirements_path=str(PREDICT_CONTAINER_DIR / "requirements.txt"),
)

to build the prediction container image. This fails with ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

After some searching, I found this post that seems to face a similar issue running Docker in a Docker container. However, I am not able to access the docker run command to pass -v /var/run/docker.sock:/var/run/docker.sock as the top answer suggests, because running the first container is handled by Vertex AI Pipelines / KFP. I checked the component task configurations as well.

I also tried to use docker as the base image of the first container (running Docker in Docker), but then getting Python installed on it turned out to be a bit of a pain.

Am I doing something wrong? Is there a better way to get my prediction container deployed?


Solution

  • GCP support confirmed that for security reasons it is not possible to run Docker in Vertex AI Pipelines components. Docker images can be built from inside components with CloudBuild, for example via the Python SDK.