Search code examples
python-3.xpostgresqlarchitecturedbt

DBT Duplication Check Ignores Schemas


During dbt compile, there is a model duplication check to be sure models aren’t stepping on top of each other. This check is causing me problems.

Our Architecture

Our system delineates the stages of processing into different schemas, and we're wanting begin using dbt. So, say we’re importing a raw dataset we’re calling jaffles, we’ll have a raw.jaffles table, a clean.jaffles table, and so on. Note raw and clean in this examples are different schemas.

The Problem

This breaks the duplication check. No matter how I customize the schema names, or how I call ref, the duplication check happens before touching any of that, notices we have two models named “jaffles”, ignores that they wouldn’t actually collide from being in different schemas, and throws an error.

Possible Solutions

  • Ideally, I'd customize how it solves for the paths it uses to check duplication to include schema. But I can't find how to customize that part.
  • Possibly I could skip this check altogether and do the integrity check myself. But I couldn't find options to disable this.
  • The only solution I'm seeing that could work is to rename each of the views to be unique, and this would be a lot of work polluting an otherwise super-clean naming convention we already have established.

Solution

  • As stated in the docs, "model names need to be unique, even if they are in distinct folders".

    What you could do, though, is to use custom aliases (see the docs), where you can re-use the same table/view name within 2 or more different schemas. In your example, you could have two different models that have a specific schema assigned each:

    -- models/.../raw_jaffles.sql
    {{ config(alias='jaffles', schema='raw') }}
    
    -- models/.../clean_jaffles.sql
    {{ config(alias='jaffles', schema='clean') }}
    

    Nevertheless, the file names still need to be different one from the other.