Search code examples
sqlteradata

Selecting in SQL the most recent record with dates which do not match


I have a large number of records (tens of thousands) contained in one table called Former_Customer_ID_List (unique ID numbers of clients who are former as of now), which I need to join with a second table called Records_List which contains millions of rows of data seperately on these customers with a daily snapshot taken and logged in called "Recorded as of" . Given that these are former customers all dates of the snapshots can be different given each customer could have left at a different timeframe. I would like to extract the most recent "Recorded as of" date for each of the former customers from Former_Customer_ID_List and not retunr the older snapshots in Records_List . I found a way to filter this via powerquery but it seems to be less than an optimal solution.

Here's an example of the table called Records_List

Customer_ID Customer Name Recorded as of
123 Fun Company LLC 28/12/2022
123 Fun Company LLC 29/12/2022
123 Fun Company Inc 01/01/2023
458 Hello Inc 12/01/2022
458 Hello Inc 14/01/2022
458 Hello Inc 16/01/2022
999 XYZ Company Inc 03/08/2020
999 XYZ Company Inc 06/08/2020
999 XYZ Company Inc 07/08/2020
999 XYZ Company Inc 08/08/2020

Here's an example simple join query which would return all matches but does not filter the most recent records. Hence the query instead of return 20K records returns 19 million records

SELECT *
FROM Former_Customer_ID_List
INNER JOIN Records_List ON Records_List.Customer_ID = Former_Customer_ID_List.Customer_ID;

Ideally I was interested in trying to get the following result

Final part

Customer_ID Customer Name Recorded as of
123 Fun Company Inc 01/01/2023
458 Hello Inc 16/01/2022
999 XYZ Company Inc 08/08/2020

Thank you for any suggestions/advice.


Solution

  • You can first use SQL window function to get the most recent record of each customer and join with Former_Customer_ID_List to get the final result you want.

    For the 1st part ( get the most recent record of each customer), here is the query:

    with Records_List_with_rank AS (
    select
        Customer_ID,
        "Customer Name",
        "Recorded as of",
         row_number() over(partition by Customer_ID order by "Recorded as of" desc) as row_num
    from
        Records_List
    )
    select
         Customer_ID,
        "Customer Name",
        "Recorded as of"
    from
        Records_List_with_rank
    where
        row_num = 1
    
    Customer_ID Customer Name Recorded as of
    123 Fun Company Inc 2023-01-01
    458 Hello Inc 2022-01-14
    999 XYZ Company Inc 2020-08-08

    For the 2nd part to join with Former_Customer_ID_List, here is the query:

    with Records_List_with_rank AS (
    select
        Customer_ID,
        "Customer Name",
        "Recorded as of",
         row_number() over(partition by Customer_ID order by "Recorded as of" desc) as row_num
    from
        Records_List
    )
    select
        *
    FROM
    Former_Customer_ID_List
    INNER JOIN
        Records_List_with_rank 
    ON Records_List_with_rank.Customer_ID = Former_Customer_ID_List.Customer_ID
    AND Records_List_with_rank.row_num = 1