Here are two tables:
table_a:
name year var
---------------
john 2010 a
john 2011 a
john 2012 c
alex 2020 b
alex 2021 c
tim 2015 NULL
tim 2016 NULL
joe 2010 NULL
joe 2011 NULL
jessica 2000 NULL
jessica 2001 NULL
table_b
name year var
--------------
sara 2001 a
sara 2002 b
tim 2005 c
tim 2006 d
tim 2021 f
jessica 2020 z
Here is what I am trying to accomplish:
table_a
table_b
table_a
) has a row in table_b
with a year (table_b
) that occurs BEFORE the year in table_a
table_a
with the value of var (table_b
) that is closest to the earliest year (table_a
)I tried this:
WITH min_year AS (
SELECT name, MIN(year) as min_year
FROM table_a
GROUP BY name
),
b_filtered AS (
SELECT b.name, MAX(b.year) as year, b.var
FROM table_b b
INNER JOIN min_year m ON b.name = m.name AND b.year < m.min_year
GROUP BY b.name
)
SELECT a.name, a.year,
CASE
WHEN a.var IS NULL AND b.name IS NOT NULL THEN b.var
ELSE a.var
END as var_mod
FROM table_a a
LEFT JOIN b_filtered b
ON a.name = b.name;
But I got the wrong output:
name year var_mod
john 2010 a
john 2011 a
john 2012 c
alex 2020 b
alex 2021 c
tim 2015 NULL
tim 2016 NULL
joe 2010 NULL
joe 2011 NULL
jessica 2000 NULL
jessica 2001 NULL
Correct output should be:
name year var_mod
john 2010 a
john 2011 a
john 2012 c
alex 2020 b
alex 2021 c
tim 2015 d
tim 2016 d
joe 2010 NULL
joe 2011 NULL
jessica 2000 NULL
jessica 2001 NULL
How can I correct it?
Use the handy TABLE
clause, which allows you to use a "parameterized" sub-select as below.
WITH
table_a (name, year, var) AS
(
VALUES
('john', 2010, 'a' )
, ('john', 2011, 'a' )
, ('john', 2012, 'c' )
, ('alex', 2020, 'b' )
, ('alex', 2021, 'c' )
, ('tim', 2015, NULL)
, ('tim', 2016, NULL)
, ('joe', 2010, NULL)
, ('joe', 2011, NULL)
, ('jessica', 2000, NULL)
, ('jessica', 2001, NULL)
)
, table_b (name, year, var) AS
(
VALUES
('sara', 2001, 'a')
, ('sara', 2002, 'b')
, ('tim', 2005, 'c')
, ('tim', 2006, 'd')
, ('tim', 2021, 'f')
, ('jessica', 2020, 'z')
)
SELECT a.NAME, a.YEAR, COALESCE (a.var, b.var) AS VAR_MOD
FROM table_a a
LEFT JOIN TABLE
(
SELECT b.var
FROM table_b b
WHERE b.name = a.name and b.year < a.year
ORDER BY b.year DESC
FETCH FIRST 1 ROW ONLY
) b ON 1=1
The result is:
NAME | YEAR | VAR_MOD |
---|---|---|
john | 2010 | a |
john | 2011 | a |
john | 2012 | c |
alex | 2020 | b |
alex | 2021 | c |
tim | 2015 | d |
tim | 2016 | d |
joe | 2010 | |
joe | 2011 | |
jessica | 2000 | |
jessica | 2001 |
Update:
Another much less efficient way with a number of additional the same table references:
WITH b_filtered AS
(
SELECT a.NAME, MAX (b.YEAR) AS YEAR
FROM table_a a
JOIN table_b b ON b.NAME = a.NAME AND b.YEAR < a.YEAR
GROUP BY a.NAME
)
SELECT a.NAME, a.YEAR, COALESCE (a.var, b.var) AS VAR_MOD
FROM table_a a
LEFT JOIN
(
b_filtered f
JOIN table_b b ON b.NAME = f.NAME AND b.YEAR = f.YEAR
) ON f.NAME = a.NAME