Some context:
I'm using composer-1.3.0-airflow-1.10.0
Installed PyPi package docker===2.7.0
For a while I tried to use the DockerOperator, but I need to pull images from a private gcr.io registry located in another gcp-project, and that is a mess.
I won't go into the details of why I gave up on this.. I switched to a simple PythonOperator
used to pull and run the docker image. Here how the Operator goes:
def runImage(**kwargs):
workingDir = "/app"
imageName = "eu.gcr.io/private-registry/image"
volume = {"/home/airflow/gcs/data/": {"bind": "/out/", "mode": "rw"}}
userUid = os.getuid()
command = getContainerCommand()
client = getClient()
print("pulling image")
image = pullDockerImage(client, imageName)
print("image pulled. %s", image.id)
output = client.containers.run(
image=imageName,
command=command,
volumes=volume,
privileged=True,
working_dir=workingDir,
remove=True,
read_only=False,
user=userUid)
print output
return True
task = PythonOperator(
task_id="test_pull_docker_image",
python_callable=runImage,
dag=dag
)
Image is well pulled. And it run (which was already a victory).
The container write some files to /out/
, which I mounted as a volume to /home/airflow/gcs/data
with rw
rights.
The working_dir, user, privileged, read_only
options were added for test, but I don't think they're relevent.
The files are not created.
writing a file directly in pyhton to /home/airflow/gcs/data
works just fine.
The container itself is a complied C#.
Locally if the container fails to write the files I get an error (like Unhandled Exception: System.UnauthorizedAccessException: Access to the path '/out/file.txt' is denied. ---> System.IO.IOException: Permission denied
)
But when I run the DAG inside airlfow composer everything looks just fine, container output is as expected, no error raised.
Maybe the Dockerfile could be usefull:
FROM microsoft/dotnet:2.1-sdk AS build-env
WORKDIR /app
# Copy csproj and restore as distinct layers
COPY *.csproj ./
RUN dotnet restore
# Copy everything else and build
COPY . ./
RUN dotnet publish -c Release -o out
# Build runtime image
FROM microsoft/dotnet:2.1-sdk
WORKDIR /app
COPY --from=build-env /app/out .
ENTRYPOINT ["dotnet", "programm.dll"]
So the question is,
Why does it not write the files? And how to allow the container to write files to /home/airflow/gcs/data
?
So I resolved this issue thanks to my other question
The answer here is in two parts:
/home/airflow/gcs
is a gcsfuse volume. Using this directory for the DockerVolume just doesn't work (may work by adding a plugin, I lost the link for this :/ )
We want to add a volume inside the airflow-workers, we can do so by updating the kubectl config : see this question for the how to update the config. We want to add a hostPath:
containers:
...
securityContext:
privileged: true
runAsUser: 0
capabilities:
add:
- SYS_ADMIN
...
volumeMounts:
- mountPath: /etc/airflow/airflow_cfg
name: airflow-config
- mountPath: /home/airflow/gcs
name: gcsdir
- mountPath: /var/run/docker.sock
name: docker-host
- mountPath: /bin/docker
name: docker-app
- mountPath: /path/you/want/as/volume
name: mountname
...
volumes:
- configMap:
defaultMode: 420
name: airflow-configmap
name: airflow-config
- emptyDir: {}
name: gcsdir
- hostPath:
path: /path/you/want/as/volume
type: DirectoryOrCreate
name: mountname
- hostPath:
path: /var/run/docker.sock
type: ""
name: docker-host
- hostPath:
path: /usr/bin/docker
type: ""
name: docker-app
And now in the DAG definition we can use
volume = {"/path/you/want/as/volume": {"bind": "/out/", "mode": "rw"}}
File will exists inside the POD, and you can use another task to upload them in a gcs bucket or so.
Hope it can help somewhat :)