Search code examples
google-bigquerylegacy-sql

Working with arrays with BigQuery LegacySQL


Each row in my table has a field that is an array, and I'd like to get a field from the first array entry.

For example, if my row is

[
  {
    "user_dim": {
      "user_id": "123",
      "user_properties": [
        {
          "key": "content_group",
          "value": {
            "value": {
              "string_value": "my_group"
            }
          }
        }
      ]
    },
    "event_dim": [
      {
        "name": "main_menu_item_selected",
        "timestamp_micros": "1517584420597000"
      },
      {
        "name": "screen_view",
        "timestamp_micros": "1517584420679001"
      }
    ]
  }
]

I'd like to get

user_id: 123, content_group: my_group, timestamp_1517584420597000


Solution

  • As Elliott mentioned - BigQuery Standard SQL has way much better support for ARRAYs than legacy SQL. And in general, BigQuery team recommend using Standard SQL

    So, below is for BigQuery Standard SQL (including handling wildcard stuff)

    #standardSQL
    SELECT 
      user_dim.user_id AS user_id,
      (SELECT value.value.string_value 
         FROM UNNEST(user_dim.user_properties) 
         WHERE key = 'content_group' LIMIT 1
      ) content_group,
      (SELECT event.timestamp_micros 
         FROM UNNEST(event_dim) event 
         WHERE name = 'main_menu_item_selected'
      ) ts
    FROM `project.dataset.app_events_*`
    WHERE _TABLE_SUFFIX BETWEEN '20180129' AND '20180202'   
    

    with result (for the dummy example from your question)

    Row     user_id     content_group       ts   
    1       123         my_group            1517584420597000