I would like to use DBT to pivot a column in my BigQuery table.
Since I have more than 100 values, I want my pivoted column to be dynamic, I would like something like that:
select *
from ( select ceiu.value, ceiu.user_id, ce.name as name
from company_entity_item_user ceiu
left join company_entity ce on ce.id = ceiu.company_entity_id)
PIVOT(STRING_AGG(value) FOR name IN (select distinct name from company_entity))
The problem here is I can't use a SELECT
statement inside IN
.
I know I can use Jinja templates with DBT, it could look like this:
...
PIVOT(STRING_AGG(value) FOR name IN ('{{unique_company_entities}}'))
...
But I have no idea how to use a SELECT
statement to create such variable.
Also, since I am using BigQuery, I tried using DECLARE
and SET
but I don't know how to use them in DBT, if it is even possible.
Thank for your help
Elevating the comment by @aleix-cc to an answer, because that is the best way to do this in dbt.
To get data from your database into the jinja context, you can use dbt's built-in run_query
macro. Or if you don't mind using the dbt-utils
package, you can use the get_column_values
macro from that package, which will return a list of the distinct values in that column (it also guards the call to run_query
with an {% if execute %}
statement, which is critical to preventing dbt compilation errors).
Assuming company_entity
is already a dbt model, your model becomes:
{% set company_list = dbt_utils.get_column_values(table=ref('company_entity'), column='name') %}
# company_list is a jinja list of strings. We need a comma-separated
# list of single-quoted string literals
{% set company_csv = "'" ~ company_list | join("', '") ~ "'" %}
select *
from ( select ceiu.value, ceiu.user_id, ce.name as name
from company_entity_item_user ceiu
left join company_entity ce on ce.id = ceiu.company_entity_id)
PIVOT(STRING_AGG(value) FOR name IN ({{ company_csv }})