I have two tables:
Table A, names are unique
Name | Color |
---|---|
Alex | Red |
Beck | Yellow |
Cora | Blue |
Table B, has addresses and a start/end date representing when the person lived at each address
Name | Address | Start | End |
---|---|---|---|
Alex | 41 Andover St | 2023-01-15 | 2023-06-08 |
Alex | 8 Lexington Dr | 2023-06-08 | 2023-09-17 |
Alex | 3 Sutor Ave | 2023-09-17 | NULL |
Beck | 349 Laurel Rd | 2023-08-09 | 2023-09-23 |
Beck | 155 Country Club Dr | 2023-09-23 | NULL |
Cora | 227 Henry Smith St | 2022-12-19 | NULL |
I would like to join these two tables based on a date value passed to the query. If an address was active for that date value, use it. If no addresses are found, then use the earliest active address.
Example 1, date value passed to query is 2023-01-31:
Name | Color | Address |
---|---|---|
Alex | Red | 41 Andover St |
Beck | Yellow | 349 Laurel Rd |
Cora | Blue | 227 Henry Smith St |
Example 2, date value passed to query is 2023-09-30:
Name | Color | Address |
---|---|---|
Alex | Red | 3 Sutor Ave |
Beck | Yellow | 155 Country Club Dr |
Cora | Blue | 227 Henry Smith St |
So far I have only written queries that join records when the date value exists in the start/end ranges:
SELECT *
FROM tableA a
LEFT JOIN tableB b ON a.name=b.name
AND b.start <= '2024-01-31'
AND (b.end IS NULL OR b.end >= '2024-01-31')
I am not sure how to add the second part where it joins to the earliest active address if no records match. Is this possible using only SQL? I am querying in Snowflake if relevant.
So if you build a CTE that has the first per person, then you can LEFT JOIN this as well as the in date range. Then you can use NVL2 to pick the value based on the exact match success:
with table_a(name, color) as (
select * from values
('Alex', 'Red'),
('Beck', 'Yellow'),
('Cora', 'Blue')
), table_b(name, address, start_d, end_d) as (
select * from values
('Alex', '41 Andover St', '2023-01-15', '2023-06-08'),
('Alex', '8 Lexington Dr', '2023-06-08', '2023-09-17'),
('Alex', '3 Sutor Ave', '2023-09-17', NULL),
('Beck', '349 Laurel Rd', '2023-08-09', '2023-09-23'),
('Beck', '155 Country Club Dr', '2023-09-23', NULL),
('Cora', '227 Henry Smith St', '2022-12-19', NULL)
), dates(date_d) as (
select * from values
('2023-01-31'::date),
('2023-09-30'::date)
), first_per_person as (
select *
from table_b
qualify row_number() over (partition by name order by start_d) = 1
)
SELECT
d.*,
a.*,
nvl2(b.name, b.address, f.address) as address
--nvl2(b.name, b.start_d, f.start_d) as start_d,
--nvl2(b.name, b.end_d, f.end_d) as end_d
FROM dates as d
cross join table_a as a
left join first_per_person as f
ON a.name=f.name
LEFT JOIN table_b as b
ON a.name=b.name
AND b.start_d <= d.date_d
AND (b.end_d IS NULL OR b.end_d >= d.date_d)
order by 1,2
Here I put both dates into another CTE, just so we could see it all working at the same time.
If you wanted instead of "first" the most resent to the date used, ASOF JOIN is the optimal way to get that.
This code also has the "date start/end" of the row that was used, to show how to use NVL2 to get other values, because the address line could use NVL/COALESCE but that will not work for the end
date as that might valid as null, where the primary key name
does not make sense to be null.