I have a measurement with a few million rows of data containing information about around 20 thousand websites.
show tag keys from site_info:
domain
proxy
http_response_code
show field keys from site_info:
responseTime
uuid
source
What I want to do is count all of the uuid's for each website over a given time frame. I have tried writing a query like this one:
from(bucket: "telegraf/autogen")
|> range($range)
|> filter(fn: (r) =>
r._measurement == "site_info"
r._field == "uuid")
|> group(columns:["domain"])
|> count()
However this query will take up to 45 minutes to run for a time range of just now()-6h
(assumingly due to the fact that I am trying to group data into 20k+ buckets)
Any suggestions on how to optimize the query to not take such extended amounts of time without altering the data schema?
I think for the time being flux‘s influx datastore integration is just not optimized at all. They announced that performance tuning should start in the beta phase.