Search code examples
sqlrowamazon-athenapresto

Extract data from a nested row in Athena/Presto


I have data which is being represented as an array of row(value varchar). The arrays are variable length, but I would like to extract them into their own columns, or into an array of varchar.

Despite typeof() saying this data is a row, I don't seem to be able to access it directly. Can I convert data of type row(value varchar) directly to a varchar? Or do the same with the array?

SELECT records[1], records FROM recorddata

_col0  _col1
{value=data1}   [{value=data1}, {value=data2}, {value=data3}, {value=data4}]
{value=data6}   [{value=data6}, {value=data7}]

SELECT typeof(records[1]), typeof(records) FROM recorddata

_col0   _col1
1   row(value varchar)  array(row(value varchar))
2   row(value varchar)  array(row(value varchar))
3   row(value varchar)  array(row(value varchar))

Thanks


Solution

  • After working with this more, it seems that the answer is to cast these fields to json which turns them into arrays of varchar.

    SELECT cast(records[1] as json), cast(records as json) FROM recorddata
    

    results:

    _col0  _col1
    ["data1"]   ["data1", "data2"], "data3", "data4"]
    ["data6"]   ["data6", "data7"]