I have a large number of records (tens of thousands) contained in one table called Former_Customer_ID_List (unique ID numbers of clients who are former as of now), which I need to join with a second table called Records_List which contains millions of rows of data seperately on these customers with a daily snapshot taken and logged in called "Recorded as of" . Given that these are former customers all dates of the snapshots can be different given each customer could have left at a different timeframe. I would like to extract the most recent "Recorded as of" date for each of the former customers from Former_Customer_ID_List and not retunr the older snapshots in Records_List . I found a way to filter this via powerquery but it seems to be less than an optimal solution.
Here's an example of the table called Records_List
Customer_ID | Customer Name | Recorded as of |
---|---|---|
123 | Fun Company LLC | 28/12/2022 |
123 | Fun Company LLC | 29/12/2022 |
123 | Fun Company Inc | 01/01/2023 |
458 | Hello Inc | 12/01/2022 |
458 | Hello Inc | 14/01/2022 |
458 | Hello Inc | 16/01/2022 |
999 | XYZ Company Inc | 03/08/2020 |
999 | XYZ Company Inc | 06/08/2020 |
999 | XYZ Company Inc | 07/08/2020 |
999 | XYZ Company Inc | 08/08/2020 |
Here's an example simple join query which would return all matches but does not filter the most recent records. Hence the query instead of return 20K records returns 19 million records
SELECT *
FROM Former_Customer_ID_List
INNER JOIN Records_List ON Records_List.Customer_ID = Former_Customer_ID_List.Customer_ID;
Ideally I was interested in trying to get the following result
Final part
Customer_ID | Customer Name | Recorded as of |
---|---|---|
123 | Fun Company Inc | 01/01/2023 |
458 | Hello Inc | 16/01/2022 |
999 | XYZ Company Inc | 08/08/2020 |
Thank you for any suggestions/advice.
You can first use SQL window function to get the most recent record of each customer and join with Former_Customer_ID_List to get the final result you want.
For the 1st part ( get the most recent record of each customer), here is the query:
with Records_List_with_rank AS (
select
Customer_ID,
"Customer Name",
"Recorded as of",
row_number() over(partition by Customer_ID order by "Recorded as of" desc) as row_num
from
Records_List
)
select
Customer_ID,
"Customer Name",
"Recorded as of"
from
Records_List_with_rank
where
row_num = 1
Customer_ID | Customer Name | Recorded as of |
---|---|---|
123 | Fun Company Inc | 2023-01-01 |
458 | Hello Inc | 2022-01-14 |
999 | XYZ Company Inc | 2020-08-08 |
For the 2nd part to join with Former_Customer_ID_List, here is the query:
with Records_List_with_rank AS (
select
Customer_ID,
"Customer Name",
"Recorded as of",
row_number() over(partition by Customer_ID order by "Recorded as of" desc) as row_num
from
Records_List
)
select
*
FROM
Former_Customer_ID_List
INNER JOIN
Records_List_with_rank
ON Records_List_with_rank.Customer_ID = Former_Customer_ID_List.Customer_ID
AND Records_List_with_rank.row_num = 1