Search code examples
sqlsql-serversnowflake-cloud-data-platformgrouping

Get min and max datetimes from each subgroup order by starttimes


I basically want to group the set by timeline in SQL, I am so out of ideas right now obviously group by does not work and so does row number.

Any ideas for SQL are really appreciated.

select shift_date,associate_id,name,description , min(START_TRAN_DATE) as startdate, max(end_tran_date) as end_date
from ltu_vt 
group by shift_date,associate_id,name,description
**SHIFT_DATE    ID  NAME            DESC    START_TRAN_DATE         END_TRAN_DATE**
2022-11-13  42  John Doe    ADP 2022-11-13 06:31:00.000 2022-11-13 06:31:22.000
2022-11-13  42  John Doe    LINE    2022-11-13 06:31:22.000 2022-11-13 06:50:13.000
2022-11-13  42  John Doe    HJ  2022-11-13 06:50:13.000 2022-11-13 06:50:13.000
2022-11-13  42  John Doe    HJ  2022-11-13 06:52:13.000 2022-11-13 06:52:13.000
2022-11-13  42  John Doe    HJ  2022-11-13 06:52:20.000 2022-11-13 06:52:20.000
2022-11-13  42  John Doe    HJ  2022-11-13 06:52:25.000 2022-11-13 06:52:25.000
2022-11-13  42  John Doe    HJ  2022-11-13 06:52:46.000 2022-11-13 06:52:46.000
2022-11-13  42  John Doe    BG  2022-11-13 06:53:58.000 2022-11-13 06:53:58.000
2022-11-13  42  John Doe    BG  2022-11-13 06:54:01.000 2022-11-13 06:54:01.000
2022-11-13  42  John Doe    HJ  2022-11-13 07:13:49.000 2022-11-13 07:13:49.000
2022-11-13  42  John Doe    P2L 2022-11-13 07:14:09.000 2022-11-13 07:14:09.000
2022-11-13  42  John Doe    P2L 2022-11-13 07:19:48.000 2022-11-13 07:19:48.000
2022-11-13  42  John Doe    ADP 2022-11-13 07:20:00.000 2022-11-13 07:20:00.000

expected output is

**SHIFT_DATE    ID  NAME            DESC   START_TRAN_DATE          END_TRAN_DATE**
2022-11-13  42  John Doe    ADP 2022-11-13 06:31:00.000 2022-11-13 06:31:22.000
2022-11-13  42  John Doe    LINE    2022-11-13 06:31:22.000 2022-11-13 06:50:13.000
2022-11-13  42  John Doe    HJ  2022-11-13 06:50:13.000 2022-11-13 06:52:46.000
2022-11-13  42  John Doe    BG  2022-11-13 06:53:58.000 2022-11-13 06:54:01.000
2022-11-13  42  John Doe    HJ  2022-11-13 07:13:49.000 2022-11-13 07:13:49.000
2022-11-13  42  John Doe    P2L 2022-11-13 07:14:09.000 2022-11-13 07:19:48.000
2022-11-13  42  John Doe    ADP 2022-11-13 07:20:00.000 2022-11-13 07:20:00.000

Solution

  • Please try the following solution.

    It is a well known "gaps and islands" problem.

    SQL

    -- DDL and sample data population, start
    DECLARE @tbl TABLE (SHIFT_DATE DATE, ID INT, [NAME] VARCHAR(20), [DESC] varchar(10), START_TRAN_DATE datetime, END_TRAN_DATE DATETIME);
    INSERT @tbl (SHIFT_DATE, ID, NAME, [DESC], START_TRAN_DATE, END_TRAN_DATE) VALUES
    ('2022-11-13', 42, 'John Doe', 'ADP', '2022-11-13 06:31:00.000', '2022-11-13 06:31:22.000'),
    ('2022-11-13', 42, 'John Doe', 'LINE','2022-11-13 06:31:22.000', '2022-11-13 06:50:13.000'),
    ('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:50:13.000',  '2022-11-13 06:50:13.000'),
    ('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:52:13.000',  '2022-11-13 06:52:13.000'),
    ('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:52:20.000',  '2022-11-13 06:52:20.000'),
    ('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:52:25.000',  '2022-11-13 06:52:25.000'),
    ('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:52:46.000',  '2022-11-13 06:52:46.000'),
    ('2022-11-13', 42, 'John Doe', 'BG', '2022-11-13 06:53:58.000',  '2022-11-13 06:53:58.000'),
    ('2022-11-13', 42, 'John Doe', 'BG', '2022-11-13 06:54:01.000',  '2022-11-13 06:54:01.000'),
    ('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 07:13:49.000',  '2022-11-13 07:13:49.000'),
    ('2022-11-13', 42, 'John Doe', 'P2L', '2022-11-13 07:14:09.000', '2022-11-13 07:14:09.000'),
    ('2022-11-13', 42, 'John Doe', 'P2L', '2022-11-13 07:19:48.000', '2022-11-13 07:19:48.000'),
    ('2022-11-13', 42, 'John Doe', 'ADP', '2022-11-13 07:20:00.000', '2022-11-13 07:20:00.000');
    -- DDL and sample data population, end
    
    SELECT series, SHIFT_DATE, ID, NAME, [DESC], MIN(START_TRAN_DATE) AS Date_Start, MAX(END_TRAN_DATE) AS Date_End, COUNT(SHIFT_DATE) AS Shift_Counter
    FROM
    (
        SELECT *,
               SUM(IIF([DESC] <> ns, 1, 0)) OVER (ORDER BY START_TRAN_DATE) AS series
        FROM
        (
            SELECT series.*,
                   LAG([DESC]) OVER (ORDER BY START_TRAN_DATE) AS ns
            FROM @tbl AS series
        ) q
    ) q
    GROUP BY series, SHIFT_DATE, ID, NAME, [DESC]
    ORDER BY series;
    

    Output

    series SHIFT_DATE ID NAME DESC Date_Start Date_End Shift_Counter
    0 2022-11-13 42 John Doe ADP 2022-11-13 06:31:00.000 2022-11-13 06:31:22.000 1
    1 2022-11-13 42 John Doe LINE 2022-11-13 06:31:22.000 2022-11-13 06:50:13.000 1
    2 2022-11-13 42 John Doe HJ 2022-11-13 06:50:13.000 2022-11-13 06:52:46.000 5
    3 2022-11-13 42 John Doe BG 2022-11-13 06:53:58.000 2022-11-13 06:54:01.000 2
    4 2022-11-13 42 John Doe HJ 2022-11-13 07:13:49.000 2022-11-13 07:13:49.000 1
    5 2022-11-13 42 John Doe P2L 2022-11-13 07:14:09.000 2022-11-13 07:19:48.000 2
    6 2022-11-13 42 John Doe ADP 2022-11-13 07:20:00.000 2022-11-13 07:20:00.000 1