I am trying to crack the following problem with SQL on Snowflake, but I kept running into problems when trying to use case when [condition] is null then...
. The problem I am encountering is as follows: I have two tables A
and B
, where A
has column Market_ID
, House_ID
, Revenue
and B
has column Market_ID
, House_ID
, Date
where Date
has the format YYYY-MM-DD (e.g. 2022-08-01). Now, I am trying to join these two tables based on the following rule: if the Date
column in Table B is above a fixed threshold (say, 2022-01-15), we select Market_ID
from Table A and House_ID
from either Table A or Table B depending on the last date that is lower than the fixed threshold. If the Date
column in Table B is less than certain value, then we select Market_ID
from Table A and set House_ID
to be NaN. In any case, the end goal is to cover all the Market_ID
and House_ID
in Table A, while Table B is playing the role of conditional filter
more or less.
Sample Data
Table A Market_ID House_ID Revenue
1 14023 125 80.24
2 14050 128 91.25
3 14101 196 150.25
4 15005 197 160.5
5 15516 189 120.3
6 15212 600 69.5
7 14115 209 88.2
Table B Market_ID House_ID Date
1 14023 125 2021-11-12
2 14050 223 2022-05-22
3 14101 151 2022-07-15
4 14101 196 2021-03-21
5 15005 189 2022-06-22
6 15005 595 2022-11-18
7 15516 209 2021-06-13
8 15516 313 2021-05-15
9 15212 595 2021-12-25
10 15212 600 2022-03-05
11 14115 300 2022-05-07
12 14115 209 2022-10-06
Expected output. Given a fixed date, like 2022-01-15, then I would like to have the following output, as a result of the SQL Query:
Market_ID House_ID
1 14023 NaN -- since 2022-01-15 > 2021-11-12
2 14050 223 -- since 2022-01-15 < 2022-05-22
3 14101 151 -- since 2022-01-15 < 2022-07-15
4 15005 595 -- since 2022-01-15 < max(2022-11-18, 2022-06-22)
5 15516 NaN -- since 2022-01-15 > max(2021-06-13, 2021-05-15)
6 15212 600 -- since 2022-01-15 < 2022-03-05
7 14115 209 -- since 2022-01-15 < 2022-10-06
Question. Could anyone please help me achieve the above goal with case when
statement? I tried using
case a.House_ID
when b.Date > [some condition here] then b.House_ID
else NULL
end as House_ID
from Table A a
left join Table B b
on a.market_id = b.market_id`
But it did not give me the expected output.
Question 2. What if Table A has a separate column called 'Date_from' and the rule of deciding on House_ID
is changed based on whether 'Date_from' in Table A is less than 'Date' in Table B for each Market_ID. So, how to achieve the same goal above if Table A is like
Table A Market_ID House_ID Revenue Date
1 14023 125 80.24 2022-02-15
2 14050 128 91.25 2021-05-16
3 14101 196 150.25 2022-07-03
4 15005 197 160.5 2022-09-12
5 15516 189 120.3 2021-10-15
6 15212 600 69.5 2021-12-05
7 14115 209 88.2 2022-08-20
Expected Output.
Market_ID House_ID
1 14023 NaN
2 14050 223
3 14101 151
4 15005 595
5 15516 209
6 15212 600
7 14115 NaN
I think this one can help you:
select
a.Market_ID,
b.House_ID
from TableA a
left join TableB b
on a.market_id = b.market_id
and b.Date > '2022-01-15'
qualify row_number() over (partition by a.House_ID order by Date DESC) = 1
order by a.Market_ID;
+-----------+----------+
| MARKET_ID | HOUSE_ID |
+-----------+----------+
| 14023 | NULL |
| 14050 | 223 |
| 14101 | 151 |
| 14115 | 209 |
| 15005 | 595 |
| 15212 | 600 |
| 15516 | NULL |
+-----------+----------+
For question 2:
select
a.Market_ID,
b.House_ID
from TableA a
left join TableB b
on a.market_id = b.market_id
and b.Date > a.Date_From
qualify row_number() over (partition by a.House_ID order by b.Date DESC) = 1
order by a.Market_ID;
+-----------+----------+
| MARKET_ID | HOUSE_ID |
+-----------+----------+
| 14023 | NULL |
| 14050 | 223 |
| 14101 | 151 |
| 14115 | 209 |
| 15005 | 595 |
| 15212 | 600 |
| 15516 | NULL |
+-----------+----------+
Can you check your output? I don't understand why 15212 has 600, although 14101 returns NULL. 14101 has two dates: 2022-07-15 and 2021-03-21, so it should return 151.