Search code examples
sqlprestogaps-and-islands

Check date split periods are continuous (and spotting the gaps)


I am struggling with a similar issue like in this thread: Check date split periods are continuous

Can someone help me translate the Qsebas' answer to Presto language? As this is exactly what I'd like to achieve.

I am not sure what supposed to be the outcome of this part:

CROSS APPLY  Enumerate ( ABS(DATEDIFF(d, From_Date, To_Date))) AS NUMBERS

And how to replace it. Would it be ROW_NUMBER () as a one column and ABS(DATE_DIFF()) in the second?

My sample data:

license_plate create_timestamp delete_timestamp
AA-AAA 2019-10-08 10:47:54 \N
AA-AAA 2021-01-22 12:37:21 2021-07-21 8:27:44
AA-AAA 2021-07-19 9:10:39 \N
BB-BBB 2016-04-15 8:38:59 2021-11-04 10:51:18
BB-BBB 2018-03-13 13:56:39 2021-10-07 08:21:07
BB-BBB 2021-12-23 12:42:31 \N
CC-CCC 2019-07-26 21:22:42 2021-12-17 18:21:37
CC-CCC 2021-11-05 11:08:13 2022-02-11 08:44:22

I would like to get sth like that:

license_plate create_timestamp delete_timestamp
AA-AAA 2019-10-08 10:47:54 \N
BB-BBB 2016-04-15 8:38:59 2021-10-07 08:21:07
BB-BBB 2021-12-23 12:42:31 \N
CC-CCC 2019-07-26 21:22:42 2022-02-11 08:44:22

There can be also a column count_ranges to count how many ranges for each license plate there were. There can be many entries for every license plate and they can overlap (so sorting by the create_timestamp and comparing row by row doesn't work). The idea is to group vehicles (license plates) into:

  • 'Active' (there was no gap since first create_timestamp till now/end_date); license_plate AA-AAA in my example
  • 'Reactivated' (there was at least on gap but the car is active now) - license_plate BB-BBB
  • 'Deleted' (there was no gap since first create_timestamp and last delete_timestamp but it is not active now) - license_plate CC-CCC

I know there could be more scenarios (i.e. 'Reactivated - Deleted', 'Reactivated - Reactivated' but this split will do for now.

If I had a table that I'd like I could then assign them to groups by:

CASE WHEN count_ranges > 1 THEN 'Reactivated'
           WHEN count_ranges = 1 AND delete_timestamp is null THEN 'Active'
           WHEN count_ranges = 1 AND delete_timestamp is not null THEN 'Deleted'
END AS vehicle_status

Maybe there is a different way to achieve this?

PS. Listing the exact delete_timestamp at the end of the range and create_timestamp at the beginning of the next range for a single license plate would be useful to check if a license plate was reactivated by the same user or by someone else.

Thanks!


Solution

  • Thanks @Guru Stron, you directed me to the right solution!

    I upgraded your code by adding one column to my initial database:

    select v.*,
           max(delete_timestamp) over (partition by license_plate 
           order by create_timestamp, delete_timestamp ROWS UNBOUNDED PRECEDING) 
           as moving_max
    
    from vehicles v
    

    End then I compare next create_timestamp with previous moving_max instead of previous delete_timestamp

    Didn't check with your part of code yet. But if it worked before (not as I'd like but as you intended ;) ) and my updated database looks good with the new column (every license plate has its moving_max calculated correctly) so it can't work wrong.