Search code examples
pythondockerairflowdbt

Permission denied: 'dbt_modules'


I get this error running dbt deps. Am I missing something in the dockerfile to provide access to dbt_modules? I can't seem to find where it's located or what dbt_modules even is in the documentation. I've provided the code to several yml files. Thanks in advance

Traceback

2022-06-07 04:34:59.121970 (MainThread): Running with dbt=0.21.1
2022-06-07 04:34:59.972911 (MainThread): You have an incompatible version of 'pyarrow' installed (6.0.1), please install a version that adheres to: 'pyarrow<3.1.0,>=3.0.0; extra == "pandas"'
2022-06-07 04:35:00.477470 (MainThread): running dbt with arguments Namespace(cls=<class 'dbt.task.deps.DepsTask'>, debug=False, defer=None, log_cache_events=False, log_format='default', partial_parse=None, profile=None, profiles_dir='/home/airflow/.dbt', project_dir=None, record_timing_info=None, rpc_method='deps', single_threaded=False, state=None, strict=False, target=None, test_new_parser=False, use_cache=True, use_colors=None, use_experimental_parser=False, vars='{}', warn_error=False, which='deps', write_json=True)
2022-06-07 04:35:00.478141 (MainThread): Tracking: tracking
2022-06-07 04:35:00.478667 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0ddcee20>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0ddce6d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0de7d130>]}
2022-06-07 04:35:00.479294 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0ddcee20>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0ddce6d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0de7d130>]}
2022-06-07 04:35:00.479750 (MainThread): Flushing usage events
2022-06-07 04:35:00.913755 (MainThread): Encountered an error:
2022-06-07 04:35:00.914481 (MainThread): [Errno 13] Permission denied: 'dbt_modules'
2022-06-07 04:35:00.916934 (MainThread): Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/dbt/main.py", line 127, in main
    results, succeeded = handle_and_check(args)
  File "/home/airflow/.local/lib/python3.8/site-packages/dbt/main.py", line 205, in handle_and_check
    task, res = run_from_args(parsed)
  File "/home/airflow/.local/lib/python3.8/site-packages/dbt/main.py", line 258, in run_from_args
    results = task.run()
  File "/home/airflow/.local/lib/python3.8/site-packages/dbt/task/deps.py", line 46, in run
    system.make_directory(self.config.modules_path)
  File "/home/airflow/.local/lib/python3.8/site-packages/dbt/clients/system.py", line 109, in make_directory
    raise e
  File "/home/airflow/.local/lib/python3.8/site-packages/dbt/clients/system.py", line 103, in make_directory
    os.makedirs(path)
  File "/usr/local/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: 'dbt_modules'

docker-compose.yaml

version: '3'
x-airflow-common:
  &airflow-common
  build: .
  # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
    AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
    - ./config/airflow.cfg:/opt/airflow/airflow.cfg
    - ./dbt:/opt/airflow/dbt
    - ~/.dbt:/home/airflow/.dbt:ro
    - ./dags:/dags
  user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"

dbt_project.yml

target-path: "target"  # directory which will store compiled SQL files
clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_modules"
  - "dbt_packages"

packages.yml

packages:
  - package: fishtown-analytics/dbt_utils
    version: 0.6.4

Dockerfile

FROM ${AIRFLOW_BASE_IMAGE}

USER airflow
RUN pip install dbt \ 
                apache-airflow-providers-microsoft-azure==3.7.0 \
                apache-airflow-providers-snowflake\ 
                riotwatcher \
                pandas

Solution

  • dbt creates a dbt_modules directory (renamed to dbt_packages in version 1.0) inside your dbt project directory when you run dbt deps (which installs dbt packages in your project).

    It looks like you're mounting your dbt project directory as a volume. Most likely the user that runs dbt deps (as an airflow task) is not authorized to write to that volume.

    You may be able to configure the modules-path (packages-install-path after 1.0) in your dbt_project.yml file to write to a local directory instead of the protected volume. Docs