Search code examples
sqlprestojson-extract

JSON_Extract from list of json string


I want to extract some values for particular keys from a table with json string as below.

raw_data ...
{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]} ...

I am expecting an outcome like

label amount category
XXX [1000, 100, 10] ['A', 'B', 'C']

I am using the following sql query to achieve that

select 
JSON_EXTRACT(raw_data, '$.lines[*].amount') AS amount,
JSON_EXTRACT(raw_data, '$.lines[*].category') AS category,
JSON_EXTRACT(raw_data, '$.label') AS label
from table

I can get a specific element of the list with [0] , [1] etc. But the sql code doesn't work with [*]. I am getting the following error Invalid JSON path: '$.lines[*].amount'

Edit

I am using Presto


Solution

  • Json path support in Presto is very limited, so you need to do some processing manually for example with casts and array functions:

    -- sample data
    with dataset (raw_data) as (
        values '{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]}'
    )
    
    -- query
    select label,
        transform(lines, l -> l['amount']) amount,
        transform(lines, l -> l['category']) category
    from (
        select JSON_EXTRACT(raw_data, '$.label') AS label,
            cast(JSON_EXTRACT(raw_data, '$.lines') as array(map(varchar, json))) lines
        from dataset
    );
    

    Output:

    label amount category
    XXX [1000, 100, 10] ["A", "B", "C"]

    In Trino json path support was vastly improved, so you can do next:

    -- query
    select JSON_EXTRACT(raw_data, '$.label') label,
           JSON_QUERY(raw_data, 'lax $.lines[*].amount' WITH ARRAY WRAPPER) amount,
           JSON_QUERY(raw_data, 'lax $.lines[*].category' WITH ARRAY WRAPPER) category
    from dataset;