Search code examples
grafanaamazon-timestream

Aggregating Counts in AWS Timestream Error Causes Errors


I have telemetry being pushed to AWS timestream:

measure_value::varchar IP time measure_name
test.html 192.168.1.100 2021-05-25 14:27:45 hits
blah.html 192.168.1.101 2021-05-25 14:27:45 hits
test.html 192.168.1.102 2021-05-25 14:27:46 hits

I want to have aggregates of the data displayed in timestream showing me how many hits for each uri we had for each hour.

measure_value::varchar Count time
test.html 2 2021-05-25 14:00
blah.html 1 2021-05-25 14:00

I am trying to use:

SELECT measure_value::varchar as URIs, CREATE_TIME_SERIES(time, measure_value::varchar) AS served FROM $__database.$__table WHERE $__timeFilter group by measure_value::varchar

but I'm getting the error:

ValidationException: Duplicate timestamps are not allowed in a timeseries.

Am I using the wrong function or is my data wrong?

===================

Trying @berto99's solution... I get:

graph

SELECT measure_value::varchar AS URIs, date_trunc('hour', time) AS hour, count(measure_value::varchar) as queries
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, date_trunc('hour', time)

=====================

Update #2:

graph #2 Getting there, still not 100% there.

SELECT measure_value::varchar AS URIs, bin(time, 15m) AS hour, count(measure_value::varchar) as queries
FROM $__database.$__table
WHERE $__timeFilter
GROUP BY measure_value::varchar, bin(time, 15m) order by hour

Solution

  • Finally got this all working with a combo of @Berto99's suggestions and more digging on stackoverflow - TimeStream + Grafana: not recognizing series in data.

    You have to put Berto99's suggestion into a subquery then run it through CREATE_TIME_SERIES. The final query ended up being:

    WITH binned_query AS (
    SELECT measure_value::varchar AS URIs, bin(time, 15m) AS bin_time, count(measure_value::varchar) as queries
    FROM $__database.$__table
    WHERE $__timeFilter
    GROUP BY measure_value::varchar, bin(time, 15m) order by bin_time
    )
    SELECT URIs, CREATE_TIME_SERIES(bin_time,queries) as Endpoint
    FROM binned_query 
    GROUP BY URIs
    

    Changed from using date_trunc to bin as it gives you more flexibility to do 15 minute intervals.

    Beautiful graph:

    completed graph