Search code examples
google-bigqueryjsonpath

json_extract_scalar fails with parenthesis in key name


I've got a string field which contains raw JSON data sent from the server. However, the key contains parenthesis, which seem to be causing issues when trying to extract the data within.

Data Sample:

{"Interview (Onsite)": "2015-04-06 16:58:28"}

Extraction Attempt:

timestamp(max(json_extract_scalar(a.status_history, '$.Interview (Onsite)')))

(The 'max' function is used as status_history is a repeated field)

Error:

JSONPath parse error at: (Onsite)

I've tried multiple common ways of escaping the parenthesis, but it got me nowhere.

Would appreciate suggestions on how to circumvent - I would rather not resort to regex unless I truly have to.


Solution

  • After enabling standard SQL (uncheck "Use Legacy SQL" under "Show Options" in the UI) you can use a quoted string as part of the JSON path. For example:

    SELECT
      CAST(JSON_EXTRACT_SCALAR(
        '{"Interview (Onsite)": "2015-04-06 16:58:28"}',
        "$['Interview (Onsite)']") AS TIMESTAMP) AS t;
    

    Edit: Since you have a column that is an ARRAY<STRING>, you will need to use an ARRAY subquery to apply JSON_EXTRACT_SCALAR to each element. For example:

    WITH T AS (
      SELECT
        ['{"Interview (Onsite)": "2015-04-06 16:58:28"}',
         '{"Interview (Onsite)": "2015-11-16 08:09:10"}',
         '{"Interview (Onsite)": "2016-01-01 18:12:43"}']
         AS status_history UNION ALL
      SELECT
        ['{"Interview (Onsite)": "2016-06-25 07:01:45"}']
    )
    SELECT
      ARRAY (
        SELECT CAST(JSON_EXTRACT_SCALAR(history, "$['Interview (Onsite)']") AS TIMESTAMP)
        FROM UNNEST(status_history) AS history
      ) AS interview_times
    FROM T;
    

    Alternatively, if you don't care about preserving the structure of the array, you can "flatten" it with a join, which will return one row for each element of status_history:

    WITH T AS (
      SELECT
        ['{"Interview (Onsite)": "2015-04-06 16:58:28"}',
         '{"Interview (Onsite)": "2015-11-16 08:09:10"}',
         '{"Interview (Onsite)": "2016-01-01 18:12:43"}']
         AS status_history UNION ALL
      SELECT
        ['{"Interview (Onsite)": "2016-06-25 07:01:45"}']
    )
    SELECT
      CAST(JSON_EXTRACT_SCALAR(history, "$['Interview (Onsite)']") AS TIMESTAMP)
        AS interview_time
    FROM T CROSS JOIN UNNEST(status_history) AS history;
    

    See also the section of the migration guide on handling of repeated fields.