Search code examples
sqlarraysjsondatabasepresto

Presto extract string from array of JSON elements


I am on Presto 0.273 and I have a complex JSON data from which I am trying to extract only specific values.

First, I ran SELECT JSON_EXTRACT(library_data, '.$books') which gets me all the books from a certain library. The problem is this returns an array of JSON objects that look like this:

[{
  "book_name":"abc", 
  "book_size":"453",
  "requestor":"27657899462"
  "comments":"this is a comment"
}, {
  "book_name":"def", 
  "book_size":"354",
  "requestor":"67657496274"
  "comments":"this is a comment"
}, ...
]

I would like the code to return just a list of the JSON objects, not an array. My intention is to later be able to loop through the JSON objects to find ones from a specific requester. Currently, when I loop through the given arrays using python, I get a range of errors around this data being a Series, hence trying to extract it properly rather.

I tried this SELECT JSON_EXTRACT(JSON_EXTRACT(data, '$.domains'), '$[0]') but this doesn't work because the index position of the object needed is not known.

I also tried SELECT array_join(array[books], ', ') but getting "Error casting array element to VARCHAR " error.

Can anyone please point me in the right direction?


Solution

  • Cast to array(json):

    SELECT CAST(JSON_EXTRACT(library_data, '.$books') as array(json))
    

    Or you can use it in unnest to flatten it to rows:

    SELECT *, 
        js_obj -- will contain single json object
    FROM table
    CROSS JOIN UNNEST CAST(JSON_EXTRACT(library_data, '.$books') as array(json)) as t(js_obj)