I am using Postgres 9.2.
I have the following problem:
Time | Value | Device -- Sum should be
1 v1 1 v1
2 v2 2 v1 + v2
3 v3 3 v1 + v2 + v3
4 v4 2 v1 + v4 + v3
5 v5 2 v1 + v5 + v3
6 v6 1 v6 + v5 + v3
7 v7 3 v6 + v5 + v3
Essentially, the sum needs to be across the most recent value in time for each of the N devices. In the example above, there are 3 devices.
I have tried several approaches using window functions and have been unsuccessful. I have written a stored procedure that does what I need, but it is SLOW. The SLOWness could be my lack of experience with plpgsql.
CREATE OR REPLACE FUNCTION timeseries.combine_series(id int[], startTime timestamp, endTime timestamp)
RETURNS setof RECORD AS $$
DECLARE
retval double precision = 0;
row_data timeseries.total_active_energy%ROWTYPE;
maxCount integer = 0;
sz integer = 0;
lastVal double precision[];
v_rec RECORD;
BEGIN
SELECT INTO sz array_length($1,1);
FOR row_data IN SELECT * FROM timeseries.total_active_energy WHERE time >= startTime AND time < endTime AND device_id = ANY($1) ORDER BY time
LOOP
retval = row_data.active_power;
for i IN 1..sz LOOP
IF $1[i]=row_data.device_id THEN
lastVal[i] = row_data.active_power;
ELSE
retval = retVal + COALESCE(lastVal[i],0);
END IF;
END LOOP;
SELECT row_data.time, retval into v_rec;
return next v_rec;
END LOOP;
return ;
END;
$$ LANGUAGE plpgsql;
Call:
select * from timeseries.combine_series('{552,553,554}'::int[], '2013-05-01'::timestamp, '2013-05-02'::timestamp)
AS (t timestamp with time zone, val double precision);
Sample data
CREATE OR REPLACE TEMP TABLE t (ts int, active_power real, device_id int, should_be int);
INSERT INTO t VALUES
(1,2,554,2)
,(2,3,553,5)
,(3,9,553,11)
,(4,7,553,9)
,(5,6,552,15)
,(6,8,554,21)
,(7,5,553,19)
,(8,7,553,21)
,(9,6,552,21)
,(10,7,552,22)
;
I am building on my answer to your previous question, where you presented a simpler case. Read there for an explanation of the window functions aspect of the solution:
This question presents a "unpivoted" data-set. To get to where you want to be, you could run a cross tabulation first, reducing the case to its simpler, previous form.
PostgreSQL has the additional module tablefunc providing very fast functions for that. Run this command once per database to install:
CREATE EXTENSION tablefunc;
Then all you need is this (including redundant columns in the result for debugging):
SELECT ts, active_power, device_id, should_be
, COALESCE(max(a) OVER (PARTITION BY grp_a), 0)
+ COALESCE(max(b) OVER (PARTITION BY grp_b), 0)
+ COALESCE(max(c) OVER (PARTITION BY grp_c), 0) AS special_sum
FROM (
SELECT *
, count(a) OVER w AS grp_a
, count(b) OVER w AS grp_b
, count(c) OVER w AS grp_c
FROM crosstab(
'SELECT ts, active_power, device_id, should_be
, device_id, active_power
FROM t
ORDER BY 1,2'
,'VALUES (552), (553), (554)'
) AS t (ts int, active_power int, device_id int, should_be int
, a int, b int, c int)
WINDOW w AS (ORDER BY ts)
) sub
ORDER BY ts;
Returns the desired result, and should perform well.
Note that this solution builds on a small, given list of devices - (552, 553, 554)
in your example.
Basics about crosstab()
:
About extra columns:
Advanced crosstab-foo: