I want, in a request, to fill all Null values by the last known value. When it's in a table and not in a request, it's easy:
If I define and fill my table as follows:
CREATE TABLE test_fill_null (
date INTEGER,
value INTEGER
);
INSERT INTO test_fill_null VALUES
(1,2),
(2, NULL),
(3, 45),
(4,NULL),
(5, null);
SELECT * FROM test_fill_null ;
date | value
------+-------
1 | 2
2 |
3 | 45
4 |
5 |
Then I just have to fill like that:
UPDATE test_fill_null t1
SET value = (
SELECT t2.value
FROM test_fill_null t2
WHERE t2.date <= t1.date AND value IS NOT NULL
ORDER BY t2.date DESC
LIMIT 1
);
SELECT * FROM test_fill_null;
date | value
------+-------
1 | 2
2 | 2
3 | 45
4 | 45
5 | 45
But now, I'm in a request, like this one:
WITH
pre_table AS(
SELECT
id1,
id2,
tms,
CASE
WHEN tms - lag(tms) over w < interval '5 minutes' THEN NULL
ELSE id2
END as group_id
FROM
table0
window w as (partition by id1 order by tms)
)
Where the group_id is set to id2 when the previous point is distant from more than 5 minutes, null otherwise. By doing so, I want to end up with group of points that follow each other by less than 5 minutes, and gaps of more than 5 minutes between each groups.
Then I don't know how to proceed. I tried:
SELECT distinct on (id1, id2)
t0.id1,
t0.id2,
t0.tms,
t1.group_id
FROM
pre_table t0
LEFT JOIN (
select
id1,
tms,
group_id
from pre_table t2
where t2.group_id is not null
order by tms desc
) t1
ON
t1.tms <= t0.tms AND
t1.id1 = t0.id1
WHERE
t0.id1 IS NOT NULL
ORDER BY
id1,
id2,
t1.tms DESC
But in the final result I have some group with two consecutive points which are distant from more than 5 minutes. Their should be two different groups in this case.
A "select within a select" is more commonly called "subselect" or "subquery". In your particular case it's a correlated subquery. LATERAL
joins can largely replace correlated subqueries with more flexible solutions:
I don't think you need either here.
For your first case this query is probably faster and simpler, though:
SELECT date, max(value) OVER (PARTITION BY grp) AS value
FROM (
SELECT *, count(value) OVER (ORDER BY date) AS grp
FROM test_fill_null
) sub;
count()
only counts non-null values, so grp
is incremented with every non-null value
, thereby forming groups as desired. It's trivial to pick the one non-null value
per grp
in the outer SELECT
.
For your second case, I'll assume the initial order of rows is determined by (id1, id2, tms)
as indicated by one of your queries.
SELECT id1, id2, tms
, count(step) OVER (ORDER BY id1, id2, tms) AS group_id
FROM (
SELECT *, lag(tms, 1, '-infinity') OVER (PARTITION BY id1 ORDER BY id2, tms) < tms - interval '5 min' AS step
FROM table0
) sub
ORDER BY id1, id2, tms;
Adapt to your actual order. One of these might cover it:
PARTITION BY id1 ORDER BY id2 -- ignore tms
PARTITION BY id1 ORDER BY tms -- ignore id2
fiddle - with an extended examples
Old sqlfiddle
Related: