Search code examples
dockerairflowdockeroperator

How to use "host_tmp_dir" in DockerOperator in Airflow?


In a DAG, I am using a DockerOperator, in which I need to mount a temporary directory to store some data. The container has to use a particular path on the host for this temporary directory, so I am trying to use the "host_tmp_dir" parameter of the DockerOperator, but this is not working.

Consider the following DAG example :

from airflow import DAG
from airflow.providers.docker.operators.docker import DockerOperator
from datetime import datetime

with DAG(dag_id="test_v1",
        start_date=datetime(2022,7,10),
        catchup=False) as dag:

    t = DockerOperator(
        task_id='my_job',
        api_version='auto',
        image="debian:11-slim",
        host_tmp_dir="/tmp",
        tmp_dir="/data",
        mount_tmp_dir=True,
        command = ["ls", "/data"],
        auto_remove='force'
    )

With this example, I would expect to find in the logs the content of the directory /tmp of my host (which is not empty), but the logs are empty ( = dir /data in container is empty, so the mapping is not made).

I'm using Airflow 2.3.3.

Maybe I missed something, do you have an idea ?


Solution

  • I found the explanation. In fact, host_tmp_dir is not the host directory that will be mounted directly in the container. It's the host directory IN WHICH a temporary directory will be created and mounted in the container.