Search code examples
jsongoogle-bigquerybigquery-udf

BigQuery JSON element extraction


I have a table in BigQuery with a JSON column, see below.

doc_id data
222 {...}
333 {...}

The data JSON column looks like the IDs are set as headers.

{
    "1675223776617": {
        "author": "aaa",
        "new": "2023-02-01",
        "old": null,
        "property": "asd",
        "sender": "wew"
    },
    "1675223776618": {
        "author": "aaa",
        "new": true,
        "old": null,
        "property": "asd",
        "sender": "ewew"
    },
    "1675223776619": {
        "author": "bbb",
        "new": "ySk2btk7",
        "old": null,
        "property": "qwe",
        "sender": "yyy"
    }
}

I would like to extract this JSON into this format using SQL in BigQuery.

Note, the header id isn't defined in the JSON.

doc_id id author new old property sender
222 1675223776617 aaa 2023-02-01 null asd wew
222 1675223776618 aaa true null asd ewew
222 1675223776619 bbb ySk2btk7 null qwe yyy

I tried using the JSON_EXTRACT function without any success.


Solution

  • You might consider below approach using javascript UDF.

    CREATE TEMP FUNCTION flatten_json(json STRING)
    RETURNS ARRAY<STRUCT<id STRING, author STRING, new STRING, old STRING, property STRING, sender STRING>>
    LANGUAGE js AS """
      result = [];
      for (const [key, value] of Object.entries(JSON.parse(json))) {
        value["id"] = key; result.push(value);
      }
      return result;
    """;
    
    WITH sample_table AS (
      SELECT 222 doc_id, '''{
        "1675223776617": {
            "author": "aaa",
            "new": "2023-02-01",
            "old": null,
            "property": "asd",
            "sender": "wew"
        },
        "1675223776618": {
            "author": "aaa",
            "new": true,
            "old": null,
            "property": "asd",
            "sender": "ewew"
        },
        "1675223776619": {
            "author": "bbb",
            "new": "ySk2btk7",
            "old": null,
            "property": "qwe",
            "sender": "yyy"
        }
      }''' data
    )
    SELECT doc_id, flattened.*
      FROM sample_table, UNNEST(flatten_json(json)) flattened;
    

    Query results

    enter image description here