Search code examples
sqlgoogle-bigqueryjinja2dbt

DBT - Use dynamic array for Pivot


I would like to use DBT to pivot a column in my BigQuery table.

Since I have more than 100 values, I want my pivoted column to be dynamic, I would like something like that:

select *
    from (  select ceiu.value, ceiu.user_id, ce.name as name
            from company_entity_item_user ceiu
            left join company_entity ce on ce.id = ceiu.company_entity_id)
    PIVOT(STRING_AGG(value) FOR name IN (select distinct name from company_entity))

The problem here is I can't use a SELECT statement inside IN. I know I can use Jinja templates with DBT, it could look like this:

...
    PIVOT(STRING_AGG(value) FOR name IN ('{{unique_company_entities}}'))
...

But I have no idea how to use a SELECT statement to create such variable. Also, since I am using BigQuery, I tried using DECLARE and SET but I don't know how to use them in DBT, if it is even possible.

Thank for your help


Solution

  • Elevating the comment by @aleix-cc to an answer, because that is the best way to do this in dbt.

    To get data from your database into the jinja context, you can use dbt's built-in run_query macro. Or if you don't mind using the dbt-utils package, you can use the get_column_values macro from that package, which will return a list of the distinct values in that column (it also guards the call to run_query with an {% if execute %} statement, which is critical to preventing dbt compilation errors).

    Assuming company_entity is already a dbt model, your model becomes:

    {% set company_list = dbt_utils.get_column_values(table=ref('company_entity'), column='name') %}
    # company_list is a jinja list of strings. We need a comma-separated
    # list of single-quoted string literals
    {% set company_csv = "'" ~ company_list | join("', '") ~ "'" %}
    
    select *
        from (  select ceiu.value, ceiu.user_id, ce.name as name
                from company_entity_item_user ceiu
                left join company_entity ce on ce.id = ceiu.company_entity_id)
        PIVOT(STRING_AGG(value) FOR name IN ({{ company_csv }})