I need to get the intervals where a process was on each stage, and take into account when the process went back to a stage. For example:
stage_name | from_day | to_day |
---|---|---|
A | 1 | 2 |
B | 2 | 3 |
B | 3 | 4 |
B | 4 | 5 |
C | 5 | 6 |
B | 6 | 7 |
D | 7 |
the process is currently on stage D
I want to get a table like this
stage_name | from_day | to_day |
---|---|---|
A | 1 | 2 |
B | 2 | 5 |
C | 5 | 6 |
B | 6 | 7 |
D | 7 |
This is a gaps and islands
problem and you may use the following approaches to solve this.
SELECT
stage_name,
MIN(from_day) as from_day,
MAX(to_day) as to_day
FROM (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY from_day) - ROW_NUMBER() OVER (PARTITION BY stage_name ORDER BY from_day) as grp
FROM
my_table
) t1
GROUP BY stage_name,grp
ORDER BY from_day, stage_name;
stage_name | from_day | to_day |
---|---|---|
A | 1 | 2 |
B | 2 | 5 |
C | 5 | 6 |
B | 6 | 7 |
D | 7 |
View working demo on DB Fiddle
SELECT
stage_name,
MIN(from_day) as from_day,
MAX(to_day) as to_day
FROM (
SELECT
*,
SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
FROM (
SELECT
*,
CASE
WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
ELSE 1
END as same
FROM
my_table
) t1
) t2
GROUP BY stage_name,grp
ORDER BY from_day, stage_name;
stage_name | from_day | to_day |
---|---|---|
A | 1 | 2 |
B | 2 | 5 |
C | 5 | 6 |
B | 6 | 7 |
D | 7 |
View working demo on DB Fiddle
The example above uses from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day)
to determine if the current rows from_day
is the same as the previous rows to_day
within the same stage_name
group/partition ordered by from_day
. If it is the same 0 is assigned otherwise 1. The output of this subquery has been included below for your perusal:
SELECT
*,
CASE
WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
ELSE 1
END as same
FROM
my_table
ORDER BY from_day, stage_name;
stage_name | from_day | to_day | same |
---|---|---|---|
A | 1 | 2 | 0 |
B | 2 | 3 | 0 |
B | 3 | 4 | 0 |
B | 4 | 5 | 0 |
C | 5 | 6 | 0 |
B | 6 | 7 | 1 |
D | 7 | 0 |
The window function SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day)
is then used to find the cumulative sum of these differences, thus creating groups:
SELECT
*,
SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
FROM (
SELECT
*,
CASE
WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
ELSE 1
END as same
FROM
my_table
) t1
ORDER BY from_day, stage_name;
stage_name | from_day | to_day | same | grp |
---|---|---|---|---|
A | 1 | 2 | 0 | 0 |
B | 2 | 3 | 0 | 0 |
B | 3 | 4 | 0 | 0 |
B | 4 | 5 | 0 | 0 |
C | 5 | 6 | 0 | 0 |
B | 6 | 7 | 1 | 1 |
D | 7 | 0 | 0 |
Finally by grouping by the stage_name
and grp
we can find the desired values in each group, in this case the earliest from_day
using MIN
and the latest to_day
using MAX