Search code examples
dbt

Configuring raw and analytics databases with dbt


I have been reading dbt's How we configure Snowflake guide which explains the rationale behind having a raw database and an analytics database. Raw data is loaded into your warehouse into raw (e.g. by using Fivetran) and analytics is used by dbt to save transformed data/views for data analysts/scientists.

However, I can't seem to find any guides on how to actually set this up. The profiles.yml file needs to point to where the raw data is, so that dbt can begin transforming. However, this file also seems to dictate the database and schema into which transformed data/views are saved.

Where in dbt's many .yml files do I specify globally where to save transformed data?


Solution

  • Set up your profiles.yml, which does NOT live in the actual project but rather in the ~/.dbt folder on your machine, such that it refers to your target database/schema. For development, this would look like what you see below. For production on dbt Cloud. Now, you just set up your sources like usual (see third block below). There is no universal sources option, just a target database/schema.

    Profiles.yml Docs and Snowflake Profile Docs

    -- profiles.yml
    
    my_profile:
      target: dev
      outputs:
        dev:
          type: snowflake
          account: <snowflake_server>
          user: my_user
          password: my_password
          role: my_role
          database: analytics
          warehouse: dev_wh
          schema: dbt_<myname>
          threads: 1
          client_session_keep_alive: False
    
    -- dbt_project.yml
    
    name: 'my_dbt_models'
    version: '1.0.0'
    config-version: 2
    
    profile: 'my_profile'
    ...
    ...
    ...
    

    Sources Docs

    -- src.yml 
    
    version: 2
    sources:
      - name: jaffle_shop
        database: raw
        tables: 
          - name: orders
    

    In the model:

    raw.jaffle_shop.orders becomes {{ source( 'jaffle_shop' , 'orders' ) }}

    Note, dbt processes this source such that it assumes the name is the schema by default, however, I've discovered that you can really name it whatever you want and add in a schema if you want to give it a special name. For example…

    sources:
      - name: my_special_name
        database: raw
        schema: jaffle_shop
        tables: 
          - name: orders
    

    In the model:

    raw.jaffle_shop.orders becomes {{ source( 'my_special_name' , 'orders' ) }}

    I hope all that made sense.