Search code examples
grafanainfluxdbretention

How to downsample data older than 14 days only in influxdb and keep it in the same table as raw data?


I am having a setup collecting metrics from telegraf into influxdb. Then grafana uses influxdb as data source to display graphs.

My problem is reducing disk usage, so I want to downsample old data (older than 3 days) and keep the new data (younger than 3 days) as is (raw)

I tried Retention Policy (RP) of influxdb and Continuous Queries (CQ) as described in guide: https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention

         influxdb ("telegraf")
+----------------------------+
|                            |
| +-----------------------+  |
| |  table disk_raw       |  |
| |  CURRENT RP (RAW)     +---------+
| |  (deleted after 3d)   |  |      |
| +-----------------------+  |      |CQ (average 30 min of datapoints into 1)
| +-----------------------+  |      |
| |  table_disk_ds        |  |      |
| |  LONGTERM RP          +<--------+
| |(downsampled, kept 90d)|  |
| +-----------------------+  |
|                            +<----+
+----------------------------+     |
                                   |
                                   |
      grafana                      | grafana query
+----------------------------+     |
|                            |     |
|  +----------------------+  |     |
|  |    data graph        |  +-----+
|  +----------------------+  |
|                            |
+----------------------------+

The problem is - this solution is giving you 2 tables, one for raw data and one for downsampled data. CQ is constantly writing out to downsampled data. That is not so good for me as:

  • I am using grafana to query influxdb and it reads from single table to the graph. And I want one graph for both old data and new data.
  • Using 2 databases increases disk usage

Is there any way to downsample old records in the very same table?

configuring example: https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention

grafana query

SELECT mean("used_percent") FROM "disk" WHERE ("device" = 'dm-0') AND $timeFilter GROUP BY time(10s) fill(none)


Solution

  • EDIT2: Here's a workaround implemented with template variables in Grafana
    https://github.com/grafana/grafana/issues/4262#issuecomment-475570324
    This seems like a really good solution.

    ORIGINAL ANSWER Looking into the example from the influxb page you linked

    CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
      SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
      INTO "a_year"."orders"
      FROM "orders"
      GROUP BY time(30m)
    END
    

    If you specify the same source and target table, namely orders, into both INTO and FROM clauses then the data will be written to the same table.

    However, that does not solve your issue.
    You would still need two queries to get the data from both retention policies. If you do a generic select * from disk_raw ... Influx will use the default retention policy and return data just from there.

    The way you usually go about this is by running two queries and concatenating the results. In a single request something like
    select * from rp_short.diskraw; select * from rp_long.diskraw

    EDIT: Here is a discussion of why it's not possible to do what you (and a lot of other people) want https://github.com/influxdata/influxdb/issues/2625 And also some ways to work around it.
    Briefly, one way is to handle the downsampling and high resolution data manually (i.e not with CQ) and keep it in the same retention policy. Another is to use a proxy that would augment the query depending on the time range of the query in order to get the correct data.