Search code examples
arraysjsongoogle-bigqueryshopifyfivetran

extracting name/value pairs into columns and values - Shopify Data In Bigquery


We use FiveTran to extract our data from shopify and store it in BigQuery. The field "properties" within the order_line table contains what looks like an array of key/value pairs. In this case name/value. The field type is string here is an example of the contents

order_line_id   properties
9956058529877   [{"name":"_order_bump_rule_id","value":"4afx7cbw6"},{"name":"_order_bump_bump_id","value":"769d1996-b6fb-4bc3-8d41-c4d7125768c5"},{"name":"_source","value":"order-bump"}]
4467731660885   [{"name":"shipping_interval_unit_type","value":null},{"name":"charge_delay","value":null},{"name":"charge_on_day_of_week","value":null},{"name":"charge_interval_frequency","value":null},{"name":"charge_on_day_of_month","value":null},{"name":"shipping_interval_frequency","value":null},{"name":"number_charges_until_expiration","value":null}]
4467738738773   [{"name":"shipping_interval_unit_type","value":null},{"name":"charge_delay","value":null},{"name":"charge_on_day_of_week","value":null},{"name":"charge_interval_frequency","value":null},{"name":"charge_on_day_of_month","value":null},{"name":"shipping_interval_frequency","value":null},{"name":"number_charges_until_expiration","value":null}]
4578798600277   [{"name":"shipping_interval_unit_type","value":null},{"name":"charge_interval_frequency","value":null},{"name":"shipping_interval_frequency","value":null}]

I am trying to write a query that generate one row per record with a column for each of these name values:

  • shipping_interval_unit_type
  • charge_on_day_of_week
  • charge_interval_frequency
  • charge_on_day_of_month
  • subscription_id
  • number_charges_until_expiration
  • shipping_interval_frequency

and the corresponding "value". This field "properties" can contain many different "name" values and they can be in different order each time. The "name" values noted above are not always present in the "properties" field.

I've tried json functions but it doesn't seem to be properly formatted for json. I've tried unnesting it but that fails since it is a string.


Solution

  • Consider below approach

    select * from (
      select order_line_id, 
        json_extract_scalar(property, '$.name') name,
        json_extract_scalar(property, '$.value') value
      from your_table, unnest(json_extract_array(properties)) property
    )
    pivot (min(value) for name in (
      'shipping_interval_unit_type',
      'charge_on_day_of_week',
      'charge_interval_frequency',
      'charge_on_day_of_month',
      'subscription_id',
      'number_charges_until_expiration',
      'shipping_interval_frequency'
    ))