In a DAG, I am using a DockerOperator, in which I need to mount a temporary directory to store some data. The container has to use a particular path on the host for this temporary directory, so I am trying to use the "host_tmp_dir" parameter of the DockerOperator, but this is not working.
Consider the following DAG example :
from airflow import DAG
from airflow.providers.docker.operators.docker import DockerOperator
from datetime import datetime
with DAG(dag_id="test_v1",
start_date=datetime(2022,7,10),
catchup=False) as dag:
t = DockerOperator(
task_id='my_job',
api_version='auto',
image="debian:11-slim",
host_tmp_dir="/tmp",
tmp_dir="/data",
mount_tmp_dir=True,
command = ["ls", "/data"],
auto_remove='force'
)
With this example, I would expect to find in the logs the content of the directory /tmp
of my host (which is not empty), but the logs are empty ( = dir /data
in container is empty, so the mapping is not made).
I'm using Airflow 2.3.3.
Maybe I missed something, do you have an idea ?
I found the explanation. In fact, host_tmp_dir is not the host directory that will be mounted directly in the container. It's the host directory IN WHICH a temporary directory will be created and mounted in the container.