Search code examples
sqlgreatest-n-per-groupprestotrinoamazon-athena

one row for rec_id with the last change and total days | sql


I have "customers" table with this data:

cust_id start_date end_date street_id hood_id city_id
569 1/1/2022 1/15/2022 12 17 16
569 1/19/2022 1/22/2022 9 85 15
569 1/25/2022 2/25/2022 14 65 19
241 3/13/2022 3/17/2022 3 6 19
241 3/19/2022 3/22/2022 5 8 35
241 3/25/2022 3/29/2022 5 8 488

I want to stay just with one raw per cust_id and also to stay with the last changes of street_id, hood_id, and city_id columns. also, I want to create a new sum column that brings me the "total_days" (end_date - start_date). its looks like this:

cust_id street_id hood_id city_id total_days
569 14 65 19 48
241 5 8 488 11

if it will be in presto syntax it will be wonderful.

thanks all.


Solution

  • You can get the latest row per customer with row_number() and filtering, then do the date arithmetic:

    select cust_id, street_id, hood_id, city_id,
        date_diff('day', start_date, end_date) as total_days
    from (
        select c.*, row_number() over(partition by cust_id order by end_date desc) rn
        from customers c
    ) c
    where rn = 1
    

    Ah, and you seem to want the overall date difference per customer rather than that of the last row, so we can use more window functions:

    select cust_id, street_id, hood_id, city_id, total_days
    from (
        select c.*, 
            row_number() over(partition by cust_id order by end_date desc) rn,
            sum(date_diff('day', start_date, end_date)) over(partition by cust_id) as total_days
        from customers c
    ) c
    where rn = 1