Search code examples
snowflake-cloud-data-platformdbtprefect

Is there a way to have 3 snowflake databases in the same DBT model?


I have a dbt project which gets the data from MySQL loads in RAW db in snowflake, transforms and loads in ANALYTICS db in snowflake. Now I have a requirement to create a table in a third database PROD in snowflake which aggregates data from ANALYTICS and loads in PROD.

.

If I change the DBT_DATABASE variable to PROD, I get an error since existing dbt models fail. How do I create another variable which points target database to PROD


Solution

  • There are several ways to configure the target database for dbt-snowflake, since Snowflake databases are just logical concepts and share a single connection.

    Let's say your data in RAW is configured as a source. You can add a .yml file with the source info, specifying the database:

    version: 2
    
    sources:
      - name: mysql_replica
        database: RAW
    
        tables:
          - name: mysql_table
    

    Now we can add a model that references that source and lands the transformed data into the ANALYTICS database. We can do this by setting the target's default database to ANALYTICS in our profiles.yml:

    my-snowflake-db:
      target: dev
      outputs:
        dev:
          type: snowflake
          account: [account id]
    
          # User/password auth
          user: [username]
          password: [password]
    
          role: [user role]
          database: ANALYTICS # or "{{ env_var('DBT_DATABASE') }}"
          warehouse: [warehouse name]
          schema: [dbt schema]
          threads: [1 or more]
    

    Now we can write our model, without config, to use the database we specified in our target:

    -- my_model.sql
    select * from {{ source("mysql_replica", "mysql_table") }}
    

    You can also specify a "custom" database in the model config (docs). Unlike custom schemas, which have some complex behavior, models with custom databases just use the database you specify. So in another model, we can specify the database in a config block in the model file:

    -- my_prod_model.sql
    {{ config(database="PROD") }}
    select * from {{ ref('my_model') }}
    

    If you prefer (in recent versions of dbt), you can use the config: key in a properties file instead:

    # my_prod_model.yml
    version: 2
    models:
      - name: my_prod_model
        config:
          database: PROD
    

    Or you can configure a custom database for a single model, or directory of models, in your dbt_project.yml file:

    # dbt_project.yml
    profile: my-snowflake-db
    # other stuff up here
    ...
    models:
      # materialize models in the prod directory as tables in PROD database
      prod:
        +database: PROD
        +materialized: table
    

    FINALLY, you can use "dev" schemas in your databases to separate dev from prod, but if you don't want any dev runs of dbt to write to your PROD database, you can use a little jinja in your .yml files:

    # dbt_project.yml
    profile: my-snowflake-db
    # other stuff up here
    ...
    models:
      # materialize models in the prod directory as tables in PROD database
      # if target is prod; otherwise, write to the DEV database
      prod:
        +database: "{{ 'PROD' if target.name == 'prod' else 'DEV' }}"
        +materialized: table