Search code examples
postgresqlaggregate-functions

postgres: using previous row value when current row value is null


In my app I need to produce a data output (to be then converted to JSON) for the data in a postgres database (which is actually temperature data gathered by sensors).

The data is stored in device_history_log as follows (for data between 11am and noon):

row_id;deviceid;sensor_value_raw;last_update
104401;20865735;21.56;"2015-06-10 11:00:14"
104432;493417852;23.9;"2015-06-10 11:00:58"
104516;20865735;21.06;"2015-06-10 11:05:14"
104578;493417852;23.7;"2015-06-10 11:06:43"
104583;20865735;21.12;"2015-06-10 11:13:21"
104601;20865735;21.18;"2015-06-10 11:17:25"
104623;493417852;22.2;"2015-06-10 11:18:21"
104642;20865735;21.25;"2015-06-10 11:21:29"
104937;20865735;21.31;"2015-06-10 11:25:33"
105081;20865735;21.37;"2015-06-10 11:33:41"
105154;20865735;21.43;"2015-06-10 11:37:45"
105303;493417852;24;"2015-06-10 11:46:09"
105358;20865735;21.62;"2015-06-10 11:49:58"

I want to get 5 min averages for the data and so far I'm using this query:

SELECT grid.t5||'.000000' as ts, 
            CASE
                WHEN avg(t.sensor_value_raw) ISNULL THEN -1
                ELSE avg(t.sensor_value_raw) 
            END AS sensorvalue
FROM (

      SELECT date_trunc('min', time_series) as t5 
      FROM generate_series('2015-06-10 11:00:00'::timestamp, '2015-06-11 12:00:00', 
                   '5 min') as time_series

   ) grid
LEFT JOIN device_history_log t 
     ON t.last_update >= grid.t5 AND t.last_update <  grid.t5 +  interval '5 min' AND (t.deviceid = 493417852 or t.deviceid = 20865735)
GROUP  BY grid.t5
ORDER  BY grid.t5

The resulting data looks like this:

"2015-06-10 11:00:00.000000";23.8999996185303
"2015-06-10 11:05:00.000000";21.0599994659424
"2015-06-10 11:10:00.000000";21.1200008392334
"2015-06-10 11:15:00.000000";21.1800003051758
"2015-06-10 11:20:00.000000";21.25
"2015-06-10 11:25:00.000000";21.3099994659424
"2015-06-10 11:30:00.000000";21.3700008392334
"2015-06-10 11:35:00.000000";21.4300003051758
"2015-06-10 11:40:00.000000";-1
"2015-06-10 11:45:00.000000";22.8100004196167
"2015-06-10 11:50:00.000000";-1
"2015-06-10 11:55:00.000000";-1
"2015-06-10 12:00:00.000000";22.9250001907349

I need the averages as there are more than one sensors reporting back thus need to have an average value.

My question is if it is possible for the value of 10:40:00 instead of -1 to have the previous value as the temperature sensors only report a 'change' in the temperature so 'no data' means 'no change'.

Also for the first values that are -1 (thus no previous data) if possible I'd like them removed/ignored which I can do when parsing the data but if it can be done a query level even better.

Thus my ideal result set would be like this:

"2015-06-10 11:00:00.000000";23.8999996185303
"2015-06-10 11:05:00.000000";21.0599994659424
"2015-06-10 11:10:00.000000";21.1200008392334
"2015-06-10 11:15:00.000000";21.1800003051758
"2015-06-10 11:20:00.000000";21.25
"2015-06-10 11:25:00.000000";21.3099994659424
"2015-06-10 11:30:00.000000";21.3700008392334
"2015-06-10 11:35:00.000000";21.4300003051758
"2015-06-10 11:40:00.000000";21.4300003051758
"2015-06-10 11:45:00.000000";22.8100004196167
"2015-06-10 11:50:00.000000";22.8100004196167
"2015-06-10 11:55:00.000000";22.8100004196167
"2015-06-10 12:00:00.000000";22.9250001907349

Thank you!


Solution

  • The below query fills empty values in the resultset of your original query. The method consists in splitting the data into partitions according to the number of empty values and selecting the first (non-empty) value from each partition (add * to the select to see how it works).

    WITH survey AS (
    
        SELECT grid.t5||'.000000' as ts, 
            CASE
                WHEN avg(t.sensor_value_raw) ISNULL THEN -1
                ELSE avg(t.sensor_value_raw) 
            END AS sensorvalue
        FROM (
    
              SELECT date_trunc('min', time_series) as t5 
              FROM generate_series('2015-06-10 11:00:00'::timestamp, '2015-06-11 12:00:00', 
                           '5 min') as time_series
    
           ) grid
        LEFT JOIN device_history_log t 
             ON t.last_update >= grid.t5 AND t.last_update <  grid.t5 +  interval '5 min' AND (t.deviceid = 493417852 or t.deviceid = 20865735)
        GROUP  BY grid.t5
        ORDER  BY grid.t5)
    
    SELECT
        ts, first_value(sensorvalue) OVER (PARTITION BY part ORDER BY ts) sensorvalue
    FROM (  
        SELECT *, sum((sensorvalue != -1)::int) OVER (ORDER BY ts) part
        FROM survey) alias