Search code examples
sqlsnowflake-cloud-data-platformcase

Case when for a particular column in first table based on column values in second table


I am trying to crack the following problem with SQL on Snowflake, but I kept running into problems when trying to use case when [condition] is null then.... The problem I am encountering is as follows: I have two tables A and B, where A has column Market_ID, House_ID , Revenue and B has column Market_ID, House_ID, Date where Date has the format YYYY-MM-DD (e.g. 2022-08-01). Now, I am trying to join these two tables based on the following rule: if the Date column in Table B is above a fixed threshold (say, 2022-01-15), we select Market_ID from Table A and House_ID from either Table A or Table B depending on the last date that is lower than the fixed threshold. If the Date column in Table B is less than certain value, then we select Market_ID from Table A and set House_ID to be NaN. In any case, the end goal is to cover all the Market_ID and House_ID in Table A, while Table B is playing the role of conditional filter more or less.

Sample Data

Table A    Market_ID   House_ID    Revenue
        1  14023       125         80.24
        2  14050       128         91.25
        3  14101       196         150.25
        4  15005       197         160.5
        5  15516       189         120.3
        6  15212       600         69.5
        7  14115       209         88.2
      
Table B    Market_ID   House_ID    Date
        1  14023       125         2021-11-12
        2  14050       223         2022-05-22
        3  14101       151         2022-07-15
        4  14101       196         2021-03-21
        5  15005       189         2022-06-22
        6  15005       595         2022-11-18
        7  15516       209         2021-06-13
        8  15516       313         2021-05-15
        9  15212       595         2021-12-25
       10  15212       600         2022-03-05
       11  14115       300         2022-05-07
       12  14115       209         2022-10-06

Expected output. Given a fixed date, like 2022-01-15, then I would like to have the following output, as a result of the SQL Query:

               Market_ID   House_ID    
            1  14023       NaN -- since 2022-01-15 > 2021-11-12         
            2  14050       223 -- since 2022-01-15 < 2022-05-22        
            3  14101       151 -- since 2022-01-15 < 2022-07-15          
            4  15005       595 -- since 2022-01-15 < max(2022-11-18, 2022-06-22)        
            5  15516       NaN -- since 2022-01-15 > max(2021-06-13, 2021-05-15)         
            6  15212       600 -- since 2022-01-15 < 2022-03-05        
            7  14115       209 -- since 2022-01-15 < 2022-10-06         

Question. Could anyone please help me achieve the above goal with case when statement? I tried using

case a.House_ID
  when b.Date > [some condition here] then b.House_ID
  else NULL
end as House_ID
from Table A a
left join Table B b
on a.market_id = b.market_id` 

But it did not give me the expected output.

Question 2. What if Table A has a separate column called 'Date_from' and the rule of deciding on House_ID is changed based on whether 'Date_from' in Table A is less than 'Date' in Table B for each Market_ID. So, how to achieve the same goal above if Table A is like

Table A    Market_ID   House_ID    Revenue   Date
        1  14023       125         80.24     2022-02-15
        2  14050       128         91.25     2021-05-16
        3  14101       196         150.25    2022-07-03
        4  15005       197         160.5     2022-09-12
        5  15516       189         120.3     2021-10-15
        6  15212       600         69.5      2021-12-05
        7  14115       209         88.2      2022-08-20

Expected Output.

           Market_ID   House_ID   
        1  14023       NaN         
        2  14050       223         
        3  14101       151        
        4  15005       595        
        5  15516       209         
        6  15212       600         
        7  14115       NaN         

Solution

  • I think this one can help you:

    select
    a.Market_ID, 
    b.House_ID
    from TableA a
    left join TableB b
    on a.market_id = b.market_id
    and b.Date > '2022-01-15'
    qualify row_number() over (partition by a.House_ID order by Date DESC) = 1
    order by a.Market_ID; 
    
    +-----------+----------+
    | MARKET_ID | HOUSE_ID |
    +-----------+----------+
    |     14023 | NULL     |
    |     14050 | 223      |
    |     14101 | 151      |
    |     14115 | 209      |
    |     15005 | 595      |
    |     15212 | 600      |
    |     15516 | NULL     |
    +-----------+----------+
    

    For question 2:

    select
    a.Market_ID, 
    b.House_ID
    from TableA a
    left join TableB b
    on a.market_id = b.market_id
    and b.Date > a.Date_From
    qualify row_number() over (partition by a.House_ID order by b.Date DESC) = 1
    order by a.Market_ID; 
    
    +-----------+----------+
    | MARKET_ID | HOUSE_ID |
    +-----------+----------+
    |     14023 | NULL     |
    |     14050 | 223      |
    |     14101 | 151      |
    |     14115 | 209      |
    |     15005 | 595      |
    |     15212 | 600      |
    |     15516 | NULL     |
    +-----------+----------+
    

    Can you check your output? I don't understand why 15212 has 600, although 14101 returns NULL. 14101 has two dates: 2022-07-15 and 2021-03-21, so it should return 151.