In Athena how can I structure a select statement to query the below by timestamp? The data is stored as a string
[{
"data": [{
"ct": "26.7"
}, {
"ct": "24.9",
}, {
"ct": "26.8",
}],
"timestamp": "1658102460"
}, {
"data": [{
"ct": "26.7",
}, {
"ct": "25.0",
}],
"timestamp": "1658102520"
}]
I tried the below but it just came back empty.
SELECT json_extract_scalar(insights, '$.timestamp') as ts
FROM history
What I am trying to get to is returning only the data where a timestamp is between X & Y.
When I try doing this as a struct and a cross join with unnest it's very very slow so I am trying to find another way.
json_extract_scalar
will not help here because it returns only one value. Trino improved vastly json path support but Athena has much more older version of the Presto engine which does not support it. So you need to cast to array and use unnest
(removed trailing commas from json):
-- sample data
WITH dataset (json_str) AS (
values ('[{
"data": [{
"ct": "26.7"
}, {
"ct": "24.9"
}, {
"ct": "26.8"
}],
"timestamp": "1658102460"
}, {
"data": [{
"ct": "26.7"
}, {
"ct": "25.0"
}],
"timestamp": "1658102520"
}]')
)
-- query
select mp['timestamp'] timestamp,
mp['data'] data
from dataset,
unnest(cast(json_parse(json_str) as array(map(varchar, json)))) as t(mp)
Output:
timestamp | data |
---|---|
1658102460 | [{"ct":"26.7"},{"ct":"24.9"},{"ct":"26.8"}] |
1658102520 | [{"ct":"26.7"},{"ct":"25.0"}] |
After that you can apply filtering and process data
.