Search code examples
postgresqlaveragetemporal-database

Creating Hourly average based on 2 minutes before and after of time instantaneous in PostgreSQL


I have a temporal database with 2 minutes sampling frequency and I want to extract instantaneous hourly values as 00:00, 01:00, 02, ... 23 for each day.
So, I would like to get the average value from average of values :

HH-1:58, HH:00 and HH:02 = Average of HH o'clock

OR

HH-1:59, HH:01 and HH:03 = Average of HH o'clock

Sample Data1:

9/28/2007 23:51 -1.68
9/28/2007 23:53 -1.76
9/28/2007 23:55 -1.96
9/28/2007 23:57 -2.02
9/28/2007 23:59 -1.92
9/29/2007 0:01  -1.64
9/29/2007 0:03  -1.76
9/29/2007 0:05  -1.83
9/29/2007 0:07  -1.86
9/29/2007 0:09  -1.94

Expected Result:

For 00 midnight:

(-1.92+-1.64+-1.76)/3

Sample Data2:

9/28/2007 23:54 -1.44
9/28/2007 23:56 -1.58
9/28/2007 23:58 -2.01
9/29/2007 0:00  -1.52
9/29/2007 0:02  -1.48
9/29/2007 0:04  -1.46

Expected Results:

(-2.01+-1.52+-1.48)/3


Solution

  • PostgreSQL's window functions make anything involving adjacent rows a lot simpler than it used to be. Untried but should be roughly right:

    select
      date_trunc('hour', newest_time) as average_time,
      (oldest_temp + middle_temp + newest_temp) / 3 as average_temp
    from (
      select
        date_trunc('hour', sample_time) as average_time,
        lag(sample_time, 2) over w as oldest_time,
        lag(sample_time, 1) over w as middle_time,
        sample_time as newest_time,
        lag(sample_temp, 2) over w as oldest_temp,
        lag(sample_temp, 1) over w as middle_temp,
        sample_temp as newest_temp
      from
        samples
      window
        w as (order by sample_time)
    ) as s
    where
      oldest_time = newest_time - '4 minutes'::interval and
      middle_time = newest_time - '2 minutes'::interval and
      extract(minute from newest_time) in (2, 3);
    

    I've restricted this in the where clause to exactly the scenario you've described - latest value at :02 or :03, prior 2 values 2 and 4 minutes before. Just in case you have some missing data which would otherwise give odd results like averaging over a much longer interval.