Search code examples
sqlruby-on-railspostgresqldatetimegaps-and-islands

Finding multiple consecutive dates (datetime) in Ruby on Rails / Postgresql


How can we find X consecutive dates (using by hour) that meet a condition?

EDIT: here is the SQL fiddle http://sqlfiddle.com/#!17/44928/1

Example:

Find 3 consecutive dates where aa < 2 and bb < 6 and cc < 7

Given this table called weather:

timestamp aa bb cc
01/01/2000 00:00 1 5 5
01/01/2000 01:00 5 5 5
01/01/2000 02:00 1 5 5
01/01/2000 03:00 1 5 5
01/01/2000 04:00 1 5 5
01/01/2000 05:00 1 5 5

Answer should return the 3 records from 02:00, 03:00, 04:00.

How can we do this in Ruby on Rails - or directly in SQL if that is better?

I started working on a method based on this answer: Detect consecutive dates ranges using SQL

def consecutive_dates
  the_query = "WITH t AS (
    SELECT timestamp d,ROW_NUMBER() OVER(ORDER BY timestamp) i
    FROM @d
    GROUP BY timestamp
  )
  SELECT MIN(d),MAX(d)
  FROM t
  GROUP BY DATEDIFF(hour,i,d)"

  ActiveRecord::Base.connection.execute(the_query)
end

But I was unable to get it working.


Solution

  • Assuming that you have one row every hour, then an easy way to get the first hour where this occurs uses lead():

    select t.*
    from (select t.*,
                 lead(timestamp, 2) over (order by timestamp) as timestamp_2
          from t
          where aa < 2 and bb < 6 and cc < 7
         ) t
    where timestamp_2 = timestamp + interval '2 hour';
    

    This filters on the conditions and looks at the rows two rows ahead. If it is two hours ahead, then three rows in a row match the conditions. Note: The above will return both 2020-01-01 02:00 and 2020-01-01 03:00.

    From your question you only seem to want the earliest. To handle that, use lag() as well:

    select t.*
    from (select t.*,
                 lag(timestamp) over (order by timestamp) as prev_timestamp
                 lead(timestamp, 2) over (order by timestamp) as timestamp_2
          from t
          where aa < 2 and bb < 6 and cc < 7
         ) t
    where timestamp_2 = timestamp + interval '2 hour' and
          (prev_timestamp is null or prev_timestamp < timestamp - interval '1' hour);
    

    You can generate the additional hours use generate_series() if you really need the original rows:

    select t.timestamp + n.n * interval '1 hour', aa, bb, cc
    from (select t.*,
                 lead(timestamp, 2) over (order by timestamp) as timestamp_2
          from t
          where aa < 2 and bb < 6 and cc < 7
         ) t cross join lateral
         generate_series(0, 2) n
    where timestamp_2 = timestamp + interval '2 hour';
    

    Your data seems to have precise timestamps based on the question, so the timestamp equalities will work. If the real data has more fuzziness, then the queries can be tweaked to take this into account.