Search code examples
sqldbt

Get column names AND types using star macro in dbt


Using the star macro, is there a way to also get the column data type (boolean, numerical, etc), in addition to the column name?

For example, this query uses the star macro to collect the column names from a reference table, saves it as an array variable column_names, and then I loop over this array and apply the max function to all the columns.

{% set column_names = star(
    from=ref_table,
    except=["a", "b", "c"],
    as_list=True)
%}

select 
    date_trunc('week', day) as week,
    name,

    {%- for col in column_names %}  
    max({{ col|lower }}) as {{ col | lower }}{%- if not loop.last %},{{ '\n  ' }}{% endif %}
    {%- endfor %}

from {{ ref('my_table_name') }}    
group by 1, 2

I would like to conditionally apply the max function to only boolean columns.

This might look something like

{%- for col in column_names %}  
    {% if is_boolean(col) %}  
    max({{ col|lower }}) as {{ col | lower }}{%- if not loop.last %},{{ '\n  ' }}{% endif %}
    {% endif %}
{%- endfor %}

but the problem is that the star macro passes the column names as a string, so it's not carrying any metadata with it.

How might I get the column data type here?

Data warehouse: Snowflake


Solution

  • You can view the source for dbt_utils.star here

    Under the hood, it uses dbt_utils.get_filtered_columns_in_relation. That macro also just returns column names. However! that macro uses the built-in adapter.get_columns_in_relation, which returns a list of Column objects, which have a data_type property.

    So your code becomes:

    {% set all_columns = adapter.get_columns_in_relation(
        ref("my_table")
    ) %}
    {% set except_col_names=["a", "b", "c"] %}
    
    select 
        date_trunc('week', day) as week,
        name,
    
        {%- for col in all_columns if col.name not in except_col_names %}  
        {% if col.data_type == 'BOOLEAN' %}  
        max({{ col.name|lower }}) as {{ col.name|lower }}{%- if not loop.last %},{{ '\n  ' }}{% endif %}
        {% endif %}
        {%- endfor %}
    
    from {{ ref('my_table_name') }}    
    group by 1, 2