Search code examples
mysqlsqlintervalsgaps-and-islands

Get time intervals where the process was on each stage on Jinja SQL


I need to get the intervals where a process was on each stage, and take into account when the process went back to a stage. For example:

stage_name from_day to_day
A 1 2
B 2 3
B 3 4
B 4 5
C 5 6
B 6 7
D 7

the process is currently on stage D

I want to get a table like this

stage_name from_day to_day
A 1 2
B 2 5
C 5 6
B 6 7
D 7

Solution

  • This is a gaps and islands problem and you may use the following approaches to solve this.

    Approach 1

    SELECT
        stage_name,
        MIN(from_day) as from_day,
        MAX(to_day) as to_day
    FROM (
        SELECT
            *,
           ROW_NUMBER() OVER (ORDER BY from_day) - ROW_NUMBER() OVER (PARTITION  BY stage_name ORDER BY from_day) as grp
            FROM
                my_table
        ) t1
    
    GROUP BY stage_name,grp
    ORDER BY from_day, stage_name;
    
    stage_name from_day to_day
    A 1 2
    B 2 5
    C 5 6
    B 6 7
    D 7

    View working demo on DB Fiddle

    Approach 2

    SELECT
        stage_name,
        MIN(from_day) as from_day,
        MAX(to_day) as to_day
    FROM (
        SELECT
            *,
           SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
        FROM (
            SELECT
                *,
                CASE 
                    WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
                    ELSE 1
                END as same
            FROM
                my_table
        ) t1
    ) t2
    GROUP BY stage_name,grp
    ORDER BY from_day, stage_name;
    
    stage_name from_day to_day
    A 1 2
    B 2 5
    C 5 6
    B 6 7
    D 7

    View working demo on DB Fiddle

    The example above uses from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) to determine if the current rows from_day is the same as the previous rows to_day within the same stage_name group/partition ordered by from_day. If it is the same 0 is assigned otherwise 1. The output of this subquery has been included below for your perusal:

    SELECT
        *,
        CASE 
            WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
            ELSE 1
        END as same
    FROM
        my_table
    ORDER BY from_day, stage_name;
    
    stage_name from_day to_day same
    A 1 2 0
    B 2 3 0
    B 3 4 0
    B 4 5 0
    C 5 6 0
    B 6 7 1
    D 7 0

    The window function SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) is then used to find the cumulative sum of these differences, thus creating groups:

        SELECT
            *,
            SUM(same) OVER (PARTITION BY stage_name ORDER BY from_day) as grp
        FROM (
                SELECT
                    *,
                    CASE 
                        WHEN from_day = LAG(to_day,1,from_day) OVER (PARTITION BY stage_name ORDER BY from_day) THEN 0
                        ELSE 1
                    END as same
                FROM
                    my_table
            ) t1
         ORDER BY from_day, stage_name;
    
    stage_name from_day to_day same grp
    A 1 2 0 0
    B 2 3 0 0
    B 3 4 0 0
    B 4 5 0 0
    C 5 6 0 0
    B 6 7 1 1
    D 7 0 0

    Finally by grouping by the stage_name and grp we can find the desired values in each group, in this case the earliest from_day using MIN and the latest to_day using MAX