I'm trying fill NULL
values in multiple columns (different column types INT, VARCHAR) with previous NOT NULL
value in a group ordered by date. Considering following table:
CREATE TABLE IF NOT EXISTS test (
id VARCHAR,
date DATE,
value_1 INT,
value_2 VARCHAR
);
INSERT INTO test VALUES
(1, '2022-01-04', 5, 'asdf'),
(1, '2022-01-03', NULL, NULL),
(1, '2022-01-02', NULL, 'def'),
(1, '2022-01-01', 4, NULL),
(2, '2022-01-04', 1, 'a'),
(2, '2022-01-03', NULL, NULL),
(2, '2022-01-02', 2, 'b'),
(2, '2022-01-01', NULL, NULL);
I want to get here (please consider value_1 --> INTEGER, value_2 string):
One day, PostgreSQL may support the IGNORE NULLS
option for LEAD
and LAG
functions.
In the mean time, you must use window functions to build groups, then select the maximum in each group.
SELECT id, date,
MAX(value_1) OVER (PARTITION BY id, grp_1) AS value_1,
MAX(value_2) OVER (PARTITION BY id, grp_2) AS value_2
FROM(
SELECT *,
COUNT(value_1) OVER (PARTITION BY id ORDER BY Date DESC) as grp_1,
COUNT(value_2) OVER (PARTITION BY id ORDER BY Date DESC) as grp_2
FROM test
) T
ORDER BY ID, date
Explanation:
COUNT(value_1) OVER (PARTITION BY id ORDER BY Date ASC/DESC)
is a cumulative count that only increases when value_1
is not null. The result is, for each consecutive value it takes (1
, 2
, 3
), there will be 1 record where value_1
is not null, and 0, 1 or several records where the field is null.
Example:
ORDER BY Date
specified in the window);value_1 is null
=> count
is 0.value_1 is not null
=> count
is 1.value_1 is null
=> count
stays at 1.value_1 is null
=> count
stays at 1.value_1 is not null
=> count
is 2.Using this pattern, the MAX(value_1) OVER (PARTITION BY id, grp_1)
works on this partition of record by taking the non-null value out of each group.
[count = 0]
has only 1 record with null
=> The max
is null.[count = 1]
has 1 non null and 2 null values => The max
is the non null value.[count = 2]
has 1 non null value (and an unspecified number of null values) => The max
is the non null value.The pattern keeps going; for every value of count
(except 0
), the max
window function always has exactly 1 non null value to pick.
Edit:
The above query is the answer to the original version of the question. Now that dates have been changed to be in the exact opposite order as before, the 3 occurrences of ORDER BY date
must all be inverted to match. Basically, 2 negatives make a positive.
SELECT id, date,
MAX(value_1) OVER (PARTITION BY id, grp_1) AS value_1,
MAX(value_2) OVER (PARTITION BY id, grp_2) AS value_2
FROM(
SELECT *,
COUNT(value_1) OVER (PARTITION BY id ORDER BY Date ASC) as grp_1,
COUNT(value_2) OVER (PARTITION BY id ORDER BY Date ASC) as grp_2
FROM test
) T
ORDER BY ID, date DESC