Search code examples
sqlamazon-web-servicesamazon-athenaprestotrino

Create flattened array from array of JSON values Athena


Say I have the following array of json variables in my data:

pets
---
[{"type":"dog", "name":"fido"}, {"type":"cat", "name":"steve"}, {"type":"dog", "name":"carl"}]
[{"type":"dog", "name":"ellie"}, {"type":"fish", "name":"goldie"}]
[{"type":"cat", "name":"simone"}, {"type":"lizard", "name":"sparky"}]

How would I structure an SQL query such that I can have an array of dog names for each of the rows that looks like this:

dog_names
---
["fido", "carl"]
["ellie"]
[]

Currently, the farthest I have gotten is filtering the data using:

filter(pets, r -> r.type= 'dog')

This filters the array properly, and in instances where the cardinality of the array is 1, allows me to correctly query using the index, but I would like to capture all names (and not break when no dogs are found).

To clarify, the data type of pets is ROW.


Solution

  • Based on the usage it seems that you have array of ROW's, so you can use transform on the filtered array in the similar way:

    transform(filter(pets, r -> r.type= 'dog'), r -> r.name)