Search code examples
sqlpostgresqlplpgsqldownsampling

Downsampling a postgresql table


I have a really simple database model, two tables : object and data, linked with an 1:n relationship.

Every minute, for each object, a new data is saved. The interesting attributes are : object_id (int), created_at (timestamp) and value (varchar). Object_id and created_at are used as composite PK)

My problem is that it generate way too much data. I'm trying to find an effective way to downsample the data periodically. What I need to do :

  • don't touch the X last weeks
  • keep only one value per Y minutes for each object.

I'm not so familiar with plpgsql and I strongly suspect that there is something smart to do with date_trunc and/or my composite PK containing that date...

pgsql 9.4.10


Solution

  • delete from data
    where 
        (
        -- year of `created_at` is less then current year
        extract('year' from created_at) < extract('year' from current_date)
        or
        -- year of `created_at` is equal to current year
        extract('year' from created_at) = extract('year' from current_date)
        and
        -- number of week of `created_at` is less then current week by 3 or more
        extract('week' from current_date) - extract('week' from created_at) >= 3
        )
    and 
        -- number of minutes is not a multiple of 10
        extract('minute' from created_at)::int % 10 <> 0