Search code examples
google-bigquerynesteddatabase-schema

How to retrieve the list of dynamic nested keys of BigQuery nested records


My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties)

It looks like this

enter image description here

How can I get the list of nested keys of a repeated record ? so for example I can group by properties when those items have said property non-null ?

I have tried

    select column_name
    from my_schema.INFORMATION_SCHEMA.COLUMNS
    where
        table_name = 'my_table
        

But it will only list first level keys

From the picture above, I want, as a first step, a SQL query that returns

message
user_id
seeker 
liker_id 
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt 
...

My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]

The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.

Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object

EDIT:

Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null ?

Example input (properties.x are dynamic and not known by the programmer)

search_id properties.filters.school properties.filters.type
1 MIT master
2 Princetown null
3 null master

Example output

search_id enabled_filters
1 ["school", "type"]
2 ["school"]
3 ["type"]

Solution

  • Have you looked at COLUMN_FIELD_PATHS? It should give you the paths for all columns.

    select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'
    

    [https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]