Search code examples
dbt

DBT custom schema using folder structure


is there a way in DBT to create custom schemas for a model in a derived way by looking at the folder structure?

For example, say this is my structure:

models
└-- product1
    └-- team1
    |   └-- model1.sql
    └-- team2
        └-- model2.sql

In this case, model1.sql would be created in the schema product1_team1 whereas model2.sql would be created in the schema product1_team2. I guess I can specify those "by hand" in the dbt_project.yml file, but I was wondering if there was a way to do this in an automated way - so that every new model or folder is automatically created in the right schema.

I was looking at custom schema macros (https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-custom-schemas) but it seems to be plain jinja or simple Python built-ins. Not sure how I would be able to access folder paths in those macros.

Also, is there a way to write a macro in Python? as it could be relatively straightforward knowing the file path and with the os module.


Solution

  • You can achieve that using only Jinja functions and dbt context variables.

    As you have noticed, we can overwrite the dbt built-in macro that handles the schema's name, and luckily, there's a way to access the model's path using the node variable that is defined in the arguments of the macro.

    I used the fqn property for this example:

    {% macro generate_schema_name(custom_schema_name, node) -%}
    
        {%- set default_schema = target.schema -%}
    
        {%- if custom_schema_name is none -%}
    
            {# Check if the model does not contain a subfolder (e.g, models created at the MODELS root folder) #}
            {% if node.fqn[1:-1]|length == 0 %}
                {{ default_schema }}    
            {% else %}
                {# Concat the subfolder(s) name #}
                {% set prefix = node.fqn[1:-1]|join('_') %}
                {{ prefix | trim }}
            {% endif %}
    
        {%- else -%}
    
            {{ default_schema }}_{{ custom_schema_name | trim }}
    
        {%- endif -%}
    
    {%- endmacro %}
    
    

    enter image description here

    The fqn property will return a list based on the location of your model where the first position will be the dbt project name and the last position will be your model's name. So based on your example, we'd have the following:

    [<project_name>, 'product1', 'team1', 'model1']
    

    If you do a dbt ls --m <model_name> you'll notice that the output is exactly what fqn returns

    The node.fqn[1:-1] is the shortest and most Pythonic way to slice a list. So, the command is basically removing the first and last position of the list (project name & model name) leaving only the remaining path of your model.

    With that in mind, we have a condition to check if the model doesn't contain a subfolder, because if that's the case, we'll return just the default_schema defined in the profiles.yml. Otherwise, we proceed with the logic to transform the list into a string by using the join Jinja function.

    In case you want, it would be good to do a log of the node variable to see all the available options we have for it.