Search code examples
sqljinja2etldbt

Assign value of a column to variable in sql use jinja template language


I have a sql file like this to transform a table has a column include a json string

{{ config(materialized='table') }}

with customer_orders as (
  select
    time,
    data as jsonData,

{% set my_dict = fromjson( jsonData ) %}
{% do log("Printout: " ~ my_dict, info=true) %}

  from `warehouses.raw_data.customer_orders`

  limit 5
)

select *
from customer_orders

When I run dbt run, it return like this:

Running with dbt=0.21.0
Encountered an error:
the JSON object must be str, bytes or bytearray, not Undefined

I even can not print out the value of column I want:


{{ config(materialized='table') }}

with customer_orders as (
  select
    time,
    tag,
    data as jsonData,

{% do log("Printout: " ~ data, info=true) %}

  from `warehouses.raw_data.customer_orders`

  limit 5
)

select *
from customer_orders
22:42:58 | Concurrency: 1 threads (target='dev')
22:42:58 | 
Printout: 
22:42:58 | Done.

But if I create another model to printout the values of jsonData:

{%- set payment_methods = dbt_utils.get_column_values(
    table=ref('customer_orders_model'),
    column='jsonData'
) -%}

{% do log(payment_methods, info=true) %}

{% for json in payment_methods %}
{% set my_dict = fromjson(json) %}
{% do log(my_dict, info=true) %}
{% endfor %}

It print out the json value I want

Running with dbt=0.21.0
This is log

Found 2 models, 0 tests, 0 snapshots, 0 analyses, 372 macros, 0 operations, 0 seed files, 0 sources, 0 exposures

21:41:15 | Concurrency: 1 threads (target='dev')
21:41:15 |

['{"log": "ok", "path": "/var/log/containers/...log", "time": "2021-10-26T08:50:52.412932061Z", "offset": 527, "stream": "stdout", "@timestamp": 1635238252.412932}']

{'log': 'ok', 'path': '/var/log/containers/...log', 'time': '2021-10-26T08:50:52.412932061Z', 'offset': 527, 'stream': 'stdout', '@timestamp': 1635238252.412932}

21:41:21 | Done.

But I want to process this jsonData with in a model file like customer_orders_model above.

How can I get value of a column and assign it to a variable and continue to process whatever I want (check if in json have a key I want and set it value to new column).

Notes: My purpose is that: In my table, has a json string column, I want extract this json string column into many columns so I can easily write sql query what I want.


Solution

  • In case BigQuery database, Google has a JSON functions in Standard SQL

    If your column is JSON string, I think you can use JSON_EXTRACT to get value of the key you want

    EX:

    with customer_orders as (
      select
        time,
        tag,
        data as jsonData,
        json_extract(data, '$.log') AS log,
      from `dc-warehouses.raw_data.logs_trackfoe_prod`
      limit 5
    )
    select *
    from customer_orders