Search code examples
mysqldatepentahokettle

Find time gaps between multiple dates - Kettle


I need to find gaps between multiple start-end date rows(it can be up to 4 rows per PK). I have an input of combinations PK - start date - end date, and I need to somehow check if I have any gaps between this dates. For example:

PK          Start                    End
1        2016-01-01 00:00      2016-01-01 11:00
1        2016-01-01 10:00      2016-01-01 21:00
1        2016-01-01 13:00      2016-01-01 16:00
1        2016-01-01 18:00      2016-01-01 21:00
2        2016-01-01 11:00      2016-01-01 12:00
2 .......................

I tried using the analytical step in Kettle to get the previous row's end date, and then compare it with the start date of current row, but this won't do the trick because the times can be overlapping, or as in the example if i compare row 3 and row 4 it would report an error, even tough the time is covered by row 2. So i need a solution that would compare all the rows with the same PK, and check if there is a time gap between them. I tried to create some Kettle transformation/job or a MySQL query, but i can't seem to get this right. Any ideas/suggestions?

EDIT: Now I'm trying with something like this(simplified):

SELECT t.id, t.start,t.end, max(t2.end)
FROM t, t  t2
WHERE t2.start <= t.start
GROUP BY t.pk

But the problem now is that I'm getting just one row back per PK, with Max. I would need the 'running max'. The idea behind this it to get the Max end date until now for every PK group, and compare it with the Start date(if the result is >= 0, there is no gap). So with the example before it would be:

PK          Start                    End                 MAX
1        2016-01-01 00:00      2016-01-01 11:00      2016-01-01 11:00
1        2016-01-01 10:00      2016-01-01 21:00      2016-01-01 21:00
1        2016-01-01 13:00      2016-01-01 16:00      2016-01-01 21:00
1        2016-01-01 18:00      2016-01-01 21:00      2016-01-01 21:00
2        2016-01-01 11:00      2016-01-01 12:00      2016-01-01 12:00
2 .......................

Again, any idea how to get this to function properly? Thanks.


Solution

  • I suggest you a check using having for the max .. min delta e the sum (total) of the parts ..

     select pk, max(end) - min(start) as delta , sum(start - end) as tot
     from my_table
     goup by pk
     having delta - tot > 0
    

    eventually you can manage a bit of tolerance .. too