sqlpostgresql

How to find overlapping date ranges in Postgresql between multiple rows?


I have a table of employees which contains information of their position, team, team leader etc. Every row has an valid_from and valid_to field that make a specific date range with other specification. If any information changed it will automatically creates a new row with valid_from = now() and set previous row to valid_to = now() - interval '1 day'.

The problem is that this automated row generation can creates overlapping date ranges as it is generated from the different spreadsheets and I need to manually correct them. So I'd like to write an sql query that will find these overlaps.

I made a view from the table:

select
  employee_id,
  rank() over (partition by employee_id order by valid_from) as rank,
  valid_from,
  valid_to
from myTable

which gives me data as below:

| employee_id | rank | valid_from | valid_to   |
|-------------|------|------------|------------|
| 1000        | 1    | 2016-11-28 | 2017-06-30 |
| 1000        | 2    | 2017-07-01 | 2018-02-26 |
| 1000        | 3    | 2018-02-25 | 2018-03-12 |
| 1001        | 1    | 2016-11-28 | 2017-07-30 |
| 1001        | 2    | 2017-07-31 | 2017-07-31 |
| 1001        | 3    | 2017-08-01 | 2017-08-01 |

and I'd like to add one more column with a true/false value or something similar if current row date range overlaps any other row within the same employee_id:

| employee_id | rank | valid_from | valid_to   | overlapp_exist |
|-------------|------|------------|------------|----------------|
| 1000        | 1    | 2016-11-28 | 2017-06-30 | false          |
| 1000        | 2    | 2017-07-01 | 2018-02-26 | true           |
| 1000        | 3    | 2018-02-25 | 2018-03-12 | true           |
| 1001        | 1    | 2016-11-28 | 2017-07-30 | false          |
| 1001        | 2    | 2017-07-31 | 2017-07-31 | false          |
| 1001        | 3    | 2017-08-01 | 2017-08-01 | false          |

The table contains about 50k rows. The query will be triggered from the frontend on demand so it should be optimised to be able to send back data as quickly as possible. Any help appreciated!


Solution

  • It won't be fast, as every row has to be matched with every other row:

    SELECT a.*, b.*
    FROM mytable AS a
       JOIN mytable AS b
          ON daterange(a.valid_from, a.valid_to) && daterange(b.valid_from, b.valid_to)
    WHERE (a.valid_from, a.valid_to) <= (b.valid_from, b.valid_to);
    

    It might be better to have an exclusion constraint on the table that prevents such data from being added in the first place.