Search code examples
sqldatabaseamazon-athenaprestotrino

Identifying unique keys in presto


I want to identify unique keys in json schema as shown below using presto. While there are multiple examples in SO - I am unable to use any solution for below example

WITH dataset AS (
  SELECT '{"name": "Susan Smith",
           "org": "engineering",
           "projects": [{"name":"project1", "completed":false},
           {"name":"project2", "completed":true}]}'
    AS users
)

Expected output with keys

Column
name
org
projects

Reference SO links referred SO link1 : Extract all JSON keys

SO Link 2 : Presto identify unique values


Solution

  • The solutions should work just fine for you, you just need to convert your varchar column into json with json_parse (I use succinct syntax for cross join unnest):

    WITH dataset AS (
      SELECT '{"name": "Susan Smith",
               "org": "engineering",
               "projects": [{"name":"project1", "completed":false},
               {"name":"project2", "completed":true}]}'
        AS users
    )
    
    -- query
    select key
    from dataset,
    unnest(map_keys(cast(json_parse(users) as map(varchar, json)))) as t(key);
    

    Output:

    key
    name
    org
    projects