Search code examples
sqlpostgresqldate-rangegaps-and-islands

SQL - missing time in date range


I have data in a table that shows if ignition true or false

|--------------------------------------|
|             date          | ignition |
|---------------------------|----------|
|2019-06-03 10:15:00.000000 | false    |
|---------------------------|----------|
|2019-06-03 10:30:00.000000 | false    |
|---------------------------|----------|
|2019-06-03 11:30:00.000000 | true     |
|---------------------------|----------|
|2019-06-03 11:45:00.000000 | false    |
|---------------------------|----------|
|2019-06-03 11:55:00.000000 | false    |
|---------------------------|----------|

I wrote SQL query that show ignition status in date period

SELECT min(date) as date_from, max(date) as date_to, ignition
FROM (SELECT date,
             ignition,
             row_number() over (order by date) as seqnum,
             row_number() over (partition by ignition order by date) as seqnum_s
      FROM table_data
      WHERE date >= '2019-06-01T21:00:00.000Z' AND date <= '2019-06-09T20:59:59.999Z'
    ) as raw_data
GROUP BY (seqnum - seqnum_s), ignition
ORDER BY min(date);

Query result:

|---------------------------|----------------------------|----------|
|             date_from     | date_to                    | ignition |
|---------------------------|----------------------------|----------|
|2019-06-03 10:15:00.000000 | 2019-06-03 10:30:00.000000 | false    |
|---------------------------|----------------------------|----------|
|2019-06-03 11:30:00.000000 | 2019-06-03 11:30:00.000000 | true     |
|---------------------------|----------------------------|----------|
|2019-06-03 11:45:00.000000 | 2019-06-03 11:55:00.000000 | false    |
|---------------------------|----------------------------|----------|

But date from 10:30 to 11:30 and date from 11:30 to 11:45 missing in result. I should get this one:

|---------------------------|----------------------------|----------|
|             date_from     | date_to                    | ignition |
|---------------------------|----------------------------|----------|
|2019-06-03 10:15:00.000000 | 2019-06-03 11:30:00.000000 | false    |
|---------------------------|----------------------------|----------|
|2019-06-03 11:30:00.000000 | 2019-06-03 11:45:00.000000 | true     |
|---------------------------|----------------------------|----------|
|2019-06-03 11:45:00.000000 | 2019-06-03 11:55:00.000000 | false    |
|---------------------------|----------------------------|----------|

The problem is that during aggregation into one block (e.g. 5 following rows of false) he only looks at the last row of false to determine the end time of the block instead of the first following row (the first that is true)


Solution

  • OK, just use window function lead to get the next row's value of date_from, then set this value into the current row's date_to, the SQL as below:

    select 
        date_from,
        case when lead(date_from,1) over(order by date_from) is null then date_to else lead(date_from,1) over(order by date_from) end as date_to,
        ignition
    from (
    SELECT min(date) as date_from, max(date) as date_to, ignition
    FROM (SELECT date,
                 ignition,
                 row_number() over (order by date) as seqnum,
                 row_number() over (partition by ignition order by date) as seqnum_s
          FROM table_data
          WHERE date >= '2019-06-01T21:00:00.000Z' AND date <= '2019-06-09T20:59:59.999Z'
        ) as raw_data
    GROUP BY (seqnum - seqnum_s), ignition
    ORDER BY min(date)
    ) tmp;
          date_from      |       date_to       | ignition 
    ---------------------+---------------------+----------
     2019-06-03 10:15:00 | 2019-06-03 11:30:00 | f
     2019-06-03 11:30:00 | 2019-06-03 11:45:00 | t
     2019-06-03 11:45:00 | 2019-06-03 11:55:00 | f
    (3 rows)