Search code examples
amazon-web-servicesairflowdbtmwaa

Integrating DBT with Airflow using MWAA instance


I was recently working on the DBT and Airflow integration on the MWAA instance and I followed this official link and I still struggled. Therefore, thought of posting this.

AIRFLOW VERSION: 2.2.2
  
DBT VERSION: 0.21.1

Solution

  • Below are the hints that would help to understand this link better:

    1. First of all, don't use their sample project. Create your own DBT project and upload that to the S3 bucket. For that create a sample directory and import that directory in the IDE and on the terminal and start with pip3 install dbt-snowflake (if your sink is supposed to be snowflake) if you don't have dbt cli. The second step should be dbt init which will prompt for a few inputs from the user like project name and sink details, creating a standard DBT project structure for you. No matter how hard you try without profiles.yaml you won't be able to execute anything on the MWAA instance.
    2. Make sure that this command on your project terminal works fine before uploading the project on S3: dbt run --project-dir . --profiles-dir .
    3. Remember that, the S3 bucket URI you provide to the instance it would be acting as a /usr/local/airflow path in the instance. So, while making the path changes in the DAG code make sure you know this.
    4. Based on your sink you can decide whether you need dbt-postgres==0.21.1, dbt-redshift==0.21.1 or not. Try to avoid dependencies and remove them from requirements.txt that won't be required for your use case.
    5. Also, you need to provide this value below the version in the dbt-project.yml file-> config-version: 2 and make sure you comment out models and seeds path for just dbt run command.

    You can use their dag code. It's perfect. Just make sure your path is correct. These are a few of the points due to which I wasted a lot of time. Therefore, wanted to share with the community.