I get this error running dbt deps. Am I missing something in the dockerfile to provide access to dbt_modules? I can't seem to find where it's located or what dbt_modules even is in the documentation. I've provided the code to several yml files. Thanks in advance
Traceback
2022-06-07 04:34:59.121970 (MainThread): Running with dbt=0.21.1
2022-06-07 04:34:59.972911 (MainThread): You have an incompatible version of 'pyarrow' installed (6.0.1), please install a version that adheres to: 'pyarrow<3.1.0,>=3.0.0; extra == "pandas"'
2022-06-07 04:35:00.477470 (MainThread): running dbt with arguments Namespace(cls=<class 'dbt.task.deps.DepsTask'>, debug=False, defer=None, log_cache_events=False, log_format='default', partial_parse=None, profile=None, profiles_dir='/home/airflow/.dbt', project_dir=None, record_timing_info=None, rpc_method='deps', single_threaded=False, state=None, strict=False, target=None, test_new_parser=False, use_cache=True, use_colors=None, use_experimental_parser=False, vars='{}', warn_error=False, which='deps', write_json=True)
2022-06-07 04:35:00.478141 (MainThread): Tracking: tracking
2022-06-07 04:35:00.478667 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0ddcee20>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0ddce6d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0de7d130>]}
2022-06-07 04:35:00.479294 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0ddcee20>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0ddce6d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f2d0de7d130>]}
2022-06-07 04:35:00.479750 (MainThread): Flushing usage events
2022-06-07 04:35:00.913755 (MainThread): Encountered an error:
2022-06-07 04:35:00.914481 (MainThread): [Errno 13] Permission denied: 'dbt_modules'
2022-06-07 04:35:00.916934 (MainThread): Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/dbt/main.py", line 127, in main
results, succeeded = handle_and_check(args)
File "/home/airflow/.local/lib/python3.8/site-packages/dbt/main.py", line 205, in handle_and_check
task, res = run_from_args(parsed)
File "/home/airflow/.local/lib/python3.8/site-packages/dbt/main.py", line 258, in run_from_args
results = task.run()
File "/home/airflow/.local/lib/python3.8/site-packages/dbt/task/deps.py", line 46, in run
system.make_directory(self.config.modules_path)
File "/home/airflow/.local/lib/python3.8/site-packages/dbt/clients/system.py", line 109, in make_directory
raise e
File "/home/airflow/.local/lib/python3.8/site-packages/dbt/clients/system.py", line 103, in make_directory
os.makedirs(path)
File "/usr/local/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: 'dbt_modules'
docker-compose.yaml
version: '3'
x-airflow-common:
&airflow-common
build: .
# image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.2}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./config/airflow.cfg:/opt/airflow/airflow.cfg
- ./dbt:/opt/airflow/dbt
- ~/.dbt:/home/airflow/.dbt:ro
- ./dags:/dags
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
dbt_project.yml
target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_modules"
- "dbt_packages"
packages.yml
packages:
- package: fishtown-analytics/dbt_utils
version: 0.6.4
Dockerfile
FROM ${AIRFLOW_BASE_IMAGE}
USER airflow
RUN pip install dbt \
apache-airflow-providers-microsoft-azure==3.7.0 \
apache-airflow-providers-snowflake\
riotwatcher \
pandas
dbt creates a dbt_modules
directory (renamed to dbt_packages
in version 1.0) inside your dbt project directory when you run dbt deps
(which installs dbt packages in your project).
It looks like you're mounting your dbt project directory as a volume. Most likely the user that runs dbt deps
(as an airflow task) is not authorized to write to that volume.
You may be able to configure the modules-path
(packages-install-path
after 1.0) in your dbt_project.yml
file to write to a local directory instead of the protected volume. Docs