Search code examples
sqlhivepresto

Extract complex json with random key field


I am trying to extract the following JSON into its own rows like the table below in Presto query. The issue here is the name of the key/av engine name is different for each row, and I am stuck on how I can extract and iterate on the keys without knowing the value of the key.

The json is a value of a table row

{
    "Bkav":
    {
        "detected": false,
        "result": null,
    },
    "Lionic":
    {
        "detected": true,
        "result": Trojan.Generic.3611249',
    },
    ...
AV Engine Name Detected Virus Result
Bkav false null
Lionic true Trojan.Generic.3611249

I have tried to use json_extract following the documentation here https://teradata.github.io/presto/docs/141t/functions/json.html but there is no mention of extraction if we don't know the key :( I am trying to find a solution that works in both presto & hive query, is there a common query that is applicable to both?


Solution

  • You can cast your json to map(varchar, json) and process it with unnest to flatten:

    -- sample data
    WITH dataset (json_str) AS (
        VALUES (
                '{"Bkav":{"detected": false,"result": null},"Lionic":{"detected": true,"result": "Trojan.Generic.3611249"}}'
            )
    ) 
    
    --query
    select k "AV Engine Name", json_extract_scalar(v, '$.detected') "Detected Virus", json_extract_scalar(v, '$.result') "Result"
    from (
            select cast(json_parse(json_str) as map(varchar, json)) as m
            from dataset
        )
    cross join unnest (map_keys(m), map_values(m)) t(k, v)
    

    Output:

    AV Engine Name Detected Virus Result
    Bkav false
    Lionic true Trojan.Generic.3611249