I have "customers" table with this data:
cust_id | start_date | end_date | street_id | hood_id | city_id |
---|---|---|---|---|---|
569 | 1/1/2022 | 1/15/2022 | 12 | 17 | 16 |
569 | 1/19/2022 | 1/22/2022 | 9 | 85 | 15 |
569 | 1/25/2022 | 2/25/2022 | 14 | 65 | 19 |
241 | 3/13/2022 | 3/17/2022 | 3 | 6 | 19 |
241 | 3/19/2022 | 3/22/2022 | 5 | 8 | 35 |
241 | 3/25/2022 | 3/29/2022 | 5 | 8 | 488 |
I want to stay just with one raw per cust_id and also to stay with the last changes of street_id, hood_id, and city_id columns. also, I want to create a new sum column that brings me the "total_days" (end_date - start_date). its looks like this:
cust_id | street_id | hood_id | city_id | total_days |
---|---|---|---|---|
569 | 14 | 65 | 19 | 48 |
241 | 5 | 8 | 488 | 11 |
if it will be in presto syntax it will be wonderful.
thanks all.
You can get the latest row per customer with row_number()
and filtering, then do the date arithmetic:
select cust_id, street_id, hood_id, city_id,
date_diff('day', start_date, end_date) as total_days
from (
select c.*, row_number() over(partition by cust_id order by end_date desc) rn
from customers c
) c
where rn = 1
Ah, and you seem to want the overall date difference per customer rather than that of the last row, so we can use more window functions:
select cust_id, street_id, hood_id, city_id, total_days
from (
select c.*,
row_number() over(partition by cust_id order by end_date desc) rn,
sum(date_diff('day', start_date, end_date)) over(partition by cust_id) as total_days
from customers c
) c
where rn = 1