sql t-sql databricks greatest-n-per-group row-number

SELECT last Row for each Customer based on DateTime

I have an archive with master data. This table is updated once a week to store attributes for each customer.

Customer | Attribute 1 | Attribute 2 | Attribute n | DateTime

It happens that a customerID is removed from one week to the next one.

Now I would like to extract for each Customer (Distinct) the last row that has been populated in this table.

How would you do that with SQL?

Solution

You can use row_number() window funciton with common table expression to do that:

with cte as
(select customerID  , Attribute1 , Attribute2 , Attributen  , DateTime,
row_number()over(partition by customerID order by DateTime desc)rn
from customerArchive)
select customerID  , Attribute1 , Attribute2 , Attributen  , DateTime from cte where rn=1

If an extra column named rn is not a problem you can use *

with cte as
(select *,row_number()over(partition by customerID order by DateTime desc)rn
from customerArchive)
select * from cte where rn=1

Or you can again join the cte with customer table. It will only work if you don't have any duplicate record for a given DateTime. It will be slower

    with cte as
            (select *,row_number()over(partition by customerID order by DateTime desc)rn
            from customerArchive)
            select Customer.* from cte inner join Customer 
            on Customer.customerID=cte.customerID and Customer.DateTime=cte.DateTime
            where rn=1

*** As @nbk pointed out it’s never wise to use *. Mentioning column names might take some time but it will be much safer option.