How to find overlapping records and select the latest record?

I am trying to get on record from a list of overlapping records in Presto(Trillo) DB. There are columns like loc_id, prod_id, line_id, start_dt, end_dt, update_dt.

The records are like this

( source records)

loc_id	prod_id	line_id	qty	start_dt	end_dt	upd_dt
200	10	1	500	30-1-2025:9:00	30-1-2025:15:00	30-1-2025:15:05
200	10	1	600	30-1-2025:11:00	30-1-2025:16:00	30-1-2025:16:05
200	10	1	550	30-1-2025:12:00	30-1-2025:18:00	30-1-2025:18:05
200	10	1	350	30-1-2025:18:00	30-1-2025:22:00	30-1-2025:22:05

There is no specific key column. In the above data, the first 3 records overlap based on start and end date timestamp. But the start date of 4th record is the end date of 3rd record.

As an output I want to select only the latest record from the first 3 records and the 4th record.
So select the latest record from the overlapping records.
If the records are not overlapping , select that records.

Expected output

loc_id	prod_id	line_id	qty	start_dt	end_dt	upd_dt
200	10	1	550	30-1-2025:12:00	30-1-2025:18:00	30-1-2025:18:05
200	10	1	350	30-1-2025:18:00	30-1-2025:22:00	30-1-2025:22:05


select 
loc_id
prod_id, 
line_id, 
start_dt,
end_dt, 
qty, 
upd_dt
from
(select *, 
rank() over (partition by a.loc_id , 
                          a.prod_id, 
                          a.line_id 
                          order by  cast(upd_dt as timestamp )desc) as rnk
from 
tbl a
) tt
--where tt.rnk = 1
order by tt.rnk

I tried using a rank function but it is difficult to rank it with time stamp. The overlapping records are an issue.

A similar sql as above. I cannot display exact SQL or data due to restrictions

Solution

If I understand the task correctly then you can rephrase it as "I want data split into groups in a way that for every row previous row ended later then the current one and from each group I need the latest record". Then you can use approach derived from gaps-and-islands:

-- sample data
WITH dataset(loc_id, prod_id, line_id, qty, start_dt, end_dt, upd_dt) as (
    values (200, 10, 1, 500, timestamp '2025-01-25 09:00', timestamp '2025-01-30 15:00', '30-1-2025:15:05'),
        (200, 10, 1, 600, timestamp '2025-01-25 11:00', timestamp '2025-01-30 16:00', '30-1-2025:16:05'),
        (200, 10, 1, 550, timestamp '2025-01-25 12:00', timestamp '2025-01-30 18:00', '30-1-2025:18:05'),
        (200, 10, 1, 350, timestamp '2025-01-25 18:00', timestamp '2025-01-30 22:00', '30-1-2025:22:05')
),

-- query parts
with_leads as (
    select *,
           lead(start_dt) over (partition by loc_id, prod_id, line_id order by start_dt) next_start -- determine when next row starts
    from dataset
),
with_groups as (
    select *,
        sum(if(next_start < end_dt, 0, 1)) over (partition by loc_id, prod_id, line_id order by start_dt) grp -- form groups
    from with_leads
),
with_rns as(
    select *,
       row_number() over(partition by loc_id, prod_id, line_id, grp order by start_dt desc) rn -- prepare to select latest started in the group
    from with_groups
)

select loc_id, prod_id, line_id, qty, start_dt, end_dt, upd_dt
from with_rns
where rn = 1;

Output:

loc_id	prod_id	line_id	qty	start_dt	end_dt	upd_dt
200	10	1	550	2025-01-25 12:00:00	2025-01-30 18:00:00	30-1-2025:18:05
200	10	1	350	2025-01-25 18:00:00	2025-01-30 22:00:00	30-1-2025:22:05

Note that this will break if you have gaps in end data (i.e. if you change the end_dt for second row to '2025-01-25 11:30')