sqlt-sqldatabricksgreatest-n-per-grouprow-number

SELECT last Row for each Customer based on DateTime


I have an archive with master data. This table is updated once a week to store attributes for each customer.

Customer | Attribute 1 | Attribute 2 | Attribute n | DateTime

It happens that a customerID is removed from one week to the next one.

Now I would like to extract for each Customer (Distinct) the last row that has been populated in this table.

How would you do that with SQL?


Solution

  • You can use row_number() window funciton with common table expression to do that:

    with cte as
    (select customerID  , Attribute1 , Attribute2 , Attributen  , DateTime,
    row_number()over(partition by customerID order by DateTime desc)rn
    from customerArchive)
    select customerID  , Attribute1 , Attribute2 , Attributen  , DateTime from cte where rn=1
    

    If an extra column named rn is not a problem you can use *

    with cte as
    (select *,row_number()over(partition by customerID order by DateTime desc)rn
    from customerArchive)
    select * from cte where rn=1
    

    Or you can again join the cte with customer table. It will only work if you don't have any duplicate record for a given DateTime. It will be slower

        with cte as
                (select *,row_number()over(partition by customerID order by DateTime desc)rn
                from customerArchive)
                select Customer.* from cte inner join Customer 
                on Customer.customerID=cte.customerID and Customer.DateTime=cte.DateTime
                where rn=1
    

    *** As @nbk pointed out it’s never wise to use *. Mentioning column names might take some time but it will be much safer option.