Search code examples
influxdb

Optimizing Group By in Flux


I have a measurement with a few million rows of data containing information about around 20 thousand websites.

show tag keys from site_info:  
domain 
proxy  
http_response_code

show field keys from site_info:
responseTime
uuid
source

What I want to do is count all of the uuid's for each website over a given time frame. I have tried writing a query like this one:

from(bucket: "telegraf/autogen")            
    |> range($range)            
    |> filter(fn: (r) =>                            
         r._measurement == "site_info"
         r._field == "uuid")    
    |> group(columns:["domain"])        
    |> count()

However this query will take up to 45 minutes to run for a time range of just now()-6h (assumingly due to the fact that I am trying to group data into 20k+ buckets)

Any suggestions on how to optimize the query to not take such extended amounts of time without altering the data schema?


Solution

  • I think for the time being flux‘s influx datastore integration is just not optimized at all. They announced that performance tuning should start in the beta phase.