Search code examples
sqlpostgresqlwindow-functionsgaps-and-islandssql-timestamp

Postgres search missing date in hourly sequence


I have a table period_of_hours in PostgreSQL. This table contains several column but for my case is important column timestamp (timestamp without time zone) and key(character varying). I have a scheduler which add a new row to the table with period of hour. I don't want to bother you why but sometimes it skip adding to the table a new row.

As you can see from this example period from 2024-05-06 10:00:00 to 2024-05-06 15:00:00 contains all lines consecutively, then one line is skipped 2024-05-06 16:00:00. From 2024-05-06 17:00:00 to 2024-05-06 21:00:00 is Ok and two rows is missed 2024-05-06 22:00:00 and 2024-05-06 23:00:00 than 2024-05-07 02:00:00 also missed.

Table "period_of_hours"
timestamp            key 

2024-05-06 10:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 11:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 12:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 13:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 14:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 15:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-06 17:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 18:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 19:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 20:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 21:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-07 00:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 01:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

2024-05-07 03:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 04:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 05:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'


2024-05-06 10:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 11:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 12:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 13:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 14:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 15:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-06 17:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 18:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 19:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 20:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-06 21:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-07 00:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 01:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

2024-05-07 03:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 04:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'
2024-05-07 05:00:00 'fc85b307-15d6-369b-86b5-a4aa19871dff'

First of all i want to select all rows by key (for example '009eae73-0b55-3809-9938-4bc0a342e451') and filter result set to get all missed rows ? In real table it is more than 1000 rows.

Expected result for key column '009eae73-0b55-3809-9938-4bc0a342e451':

2024-05-06 16:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 22:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-06 23:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'
2024-05-07 02:00:00 '009eae73-0b55-3809-9938-4bc0a342e451'

This solution is works for me.

with continuous("timestamp") as (
  select 
    '2024-05-06 10:00:00' :: timestamp + n * '1 hour' :: interval 
  FROM 
    generate_series(
      0, 
      (
        select 
          extract(
            epoch 
            from 
              max(timestamp)- min(
                '2024-05-06 10:00:00' :: timestamp
              )
          )/ 3600 
        from 
          period_of_hours 
        where 
          "key" = '0a3e1588-ad59-3586-b071-d5001f5ff9a7'
      ):: integer, 
      1
    ) as n
) 
select 
  c.timestamp 
from 
  continuous c 
  left join period_of_hours p on c.timestamp = p.timestamp 
  and p.key = '0a3e1588-ad59-3586-b071-d5001f5ff9a7' 
where 
  p.timestamp is null;

is it possible to combine key in construction and solution above because i can have a list of UUID ?

key in (
    'a63ffce5-1d86-3afc-8b5a-97452f935632', 
    '0a3e1588-ad59-3586-b071-d5001f5ff9a7',
    'e5524b9b-3aca-3b80-9aab-19bfec30fb9b')```

I dont understand whats is continuous("timestamp") ? It is a function or what ?


Solution

  • You can generate a continuous version of your data set and run an anti-join against it: demo

    with continuous("timestamp") as (
    select '2024-05-06 10:00:00'::timestamp+n*'1 hour'::interval
    from generate_series( 0
                         ,(select extract(epoch from max("timestamp")
                                                    -min("timestamp"))/3600
                          from period_of_hours
                          where "key"='009eae73-0b55-3809-9938-4bc0a342e451')
                         ,1)as n)
    select c."timestamp" from continuous c
    left join period_of_hours p 
      on c."timestamp"=p."timestamp"
     and p."key"='009eae73-0b55-3809-9938-4bc0a342e451'
    where p."timestamp" is null;
    
    timestamp
    2024-05-06 16:00:00
    2024-05-06 22:00:00
    2024-05-06 23:00:00
    2024-05-07 02:00:00

    An except, a not in or <>all() could achieve the same.

    You could also treat this as gaps-and-islands problem and run a window function to spot the gaps and report their width, then generate a patch of rows using that:

    select "timestamp"+n*'1h'::interval as "timestamp"
    from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
                   /3600 as "diff"
          from period_of_hours
          where "key"='009eae73-0b55-3809-9938-4bc0a342e451'
          window w1 as(order by "timestamp") )_
    cross join lateral generate_series(1,"diff"-1,1) as n
    where "diff">1;
    

    If you want to list each key with all its gaps, it's enough to change the window definition a bit:

    select "key","timestamp"+n*'1h'::interval as "timestamp"
    from (select *,extract(epoch from (lead("timestamp")over w1)-"timestamp")
                   /3600 as "diff"
          from period_of_hours
          window w1 as(partition by "key" order by "timestamp") )_
    cross join lateral generate_series(1,"diff"-1,1) as n
    where "diff">1;
    
    key timestamp
    009eae73-0b55-3809-9938-4bc0a342e451 2024-05-06 16:00:00
    009eae73-0b55-3809-9938-4bc0a342e451 2024-05-06 22:00:00
    009eae73-0b55-3809-9938-4bc0a342e451 2024-05-06 23:00:00
    009eae73-0b55-3809-9938-4bc0a342e451 2024-05-07 02:00:00
    fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-06 16:00:00
    fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-06 22:00:00
    fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-06 23:00:00
    fc85b307-15d6-369b-86b5-a4aa19871dff 2024-05-07 02:00:00