Say I have a table in BQ called rating
with a struct column called rating_record
. Schema as:
[
{
"name": "id",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
},
{
"name": "rating_record",
"mode": "NULLABLE",
"type": "RECORD",
"description": null,
"fields": [
{
"name": "high_drop",
"type": "BOOLEAN",
"fields": []
},
{
"name": "medium_bump",
"type": "BOOLEAN",
"fields": []
}
]
rating_record
contains fields high_drop
and medium_bump
and there could be many fields with suffix _drop
and _bump
with true or false values. I want to iterate this record type (struct) field using a dbt macro to create two new columns against an id
called drop_reasons
& bump_reasons
- drop_reasons
in this case would be = 'high_drop'
if the value is true.
I tried to iterate the record with a sql_statement
and using dbt_utils.get_query_results_as_dict
to get the outcome but unable to create columns accordingly.
{% set sql_statement %}
select id, rating_record from {{ ref('source_table' }}
{% endset %}
{%- set ids_and_ratings = dbt_utils.get_query_results_as_dict(sql_statement) -%}
select
{% for id in ids_and_ratings['id'] | unique -%}
{% set bump_reasons = [] %}
{% set drop_reasons = [] %}
{% for rating_record in ids_and_ratings['rating_record'] | unique -%}
{% for key, value in fromjson(rating_record).items() -%}
{% if key.endswith('bump') and value is sameas true %}
{{ bump_reasons.append(key) }}
{% elif key.endswith('drop') and value is sameas true %}
{{ drop_reasons.append(key) }}
{% endif %}
{% endfor %}
{% endfor %}
{{ print(drop_reasons) }}
{% endfor %}
from {{ ref('source_table' }}
I suppose values in the fields of the table schema is a single value rather than an array.
get_query_results_as_dict returns a key of the column and the value to be a list of the column value. In the jinja code the first loop iterate all the ids, for each id the second loop iterate all rating_record struct for all ids but not of the id in the first loop.
From the logic of your jinja code, it can be done using BigQuery.
The dummy data is
insert into database.table_name values
("01", STRUCT(True, False, True, True)),
("02", STRUCT(False, False, True, False)),
("03", STRUCT(False, True, True, False))
Added a macro to get keys in the rating_record struct
{% macro get_struct_fields() %}
{% set query %}
SELECT split(field_path, ".")[OFFSET(1)] as fields
FROM `project_id`.`region-us`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE table_name = "struct_tbl"
and column_name = "rating_record"
and data_type = "BOOL"
{% endset %}
{% set results = run_query(query) %}
{% if execute %}
{% set fields = results.columns[0].values() %}
{% else %}
{% set fields = [] %}
{% endif %}
{%do log(fields, info=true) %}
{{ return(fields)}}
{% endmacro %}
The model is
{% set fields = get_struct_fields() %}
{% set fields = fields | join(", ") %}
select *
from
(select id, split(bump_drop, "_")[offset(1)] as bump_drop, bump_drop as
reason
from
(select id, rating_record.* from `project_id.database.table_name`)
unpivot (tf for bump_drop in ({{ fields }}))
where tf is true
)
pivot (array_agg(reason ignore nulls) as reasons for bump_drop in ("bump",
"drop"))
where
select * from (
select id, rating_record.* from `project_id.database.table_name`)
unpivot (tf for bump_drop in (high_bump, high_drop, medium_bump, medium_drop))
will result in a table with columns id, tf which contains true/false, and bump_drop with values of something_bump/something_drop.
Then select only true part and separate out the 'bump' and 'drop'. The unpivot will put reasons for bump and drop separately into an array for each id.