Search code examples
sqljinja2dbt

Jinja DBT for loop union with some different columns across tables


I'm creating a master events table, where almost all of the same columns are in every table, but a few tables are missing one or two columns. In those cases I'd like to replace those columns with null whenever the column doesn't exist in the table. When I run the code below, every cell in the output table is NULL.

Assume columns 1 and 2 are in every table, but column 3 is in table 1 and 2 but not 3.

{{ config(schema='MYSCHEMA', materialized='table') }}

{% set tables = ['table1', 'table2', 'table3'] %}

{% set possible_columns = ['col1', 'col2', 'col3'] %}

{% for table in tables %}
  {%- set table_columns = adapter.get_columns_in_relation( ref(table) ) -%}
  select
    {% for pc in possible_columns %}

      {% if not loop.last -%}

        {% if pc in table_columns %}
          {{ pc }},
        {% else %}
          null as {{ pc }},
        {%- endif %}

      {% else %}
        {% if pc in table_columns %}
          {{ pc }}
        {% else %}
          null as {{ pc }}
        {%- endif %}

      {% endif %}

    {%- endfor %}

  from
    {{ ref(table) }}

  {% if not loop.last -%}
    union all
  {%- endif %}
{% endfor %}

Solution

  • I'd recommend using dbt_utils.union_relations for this. It does exactly what you describe -- it creates a superset of columns from all the tables and fills in nulls where appropriate.

    {{ dbt_utils.union_relations(
        relations=[
            ref('table1'),
            ref('table2'),
            ref('table3'),
        ],
        include = ['col1', 'col2', 'col3']
    ) }}
    

    BUT, if you want to roll your own...

    The problem with your code is this line:

    {% if pc in table_columns %}
    

    adapter.get_columns_in_relation returns a list of Column objects, not a list of strings. To compare pc to the name of a column, you could use:

    {% set cols = [] %}
    {% for col in table_columns %}
        {% do cols.append(col.name) %}
    {% endfor %}
    ...
    {% if pc in cols %}
    

    You can also remove a bunch of redundant code by pushing down the if not loop.last block to just the comma, so this all becomes:

    {% for table in tables %}
        {%- set table_columns = adapter.get_columns_in_relation( ref(table) ) -%}
        {% set cols = [] %}
        {% for col in table_columns %}
            {% do cols.append(col.name) %}
        {% endfor %}
        select
        {% for pc in possible_columns %}
            {% if pc in cols %}
                {{ pc -}}
            {% else %}
                null as {{ pc -}}
            {%- endif %}{%- if not loop.last -%},{% endif %}
        {%- endfor %}
        ...