Search code examples
sqlsnowflake-cloud-data-platform

Best way to work around with repetition of value thru groupby


It's quite hard to explain what I'm experiencing so it's best to just put it over here and explain.

EMPLOYEE_ID EMPLOYEE_SUPERVISOR_ID EFFECTIVE_START_DATE EFFECTIVE_END_DATE
30665 61046 30/04/2021 30/09/2021
30665 8009098 30/09/2021 31/12/2021
30665 8009098 31/12/2021 31/07/2022
30665 73280 31/07/2022 31/08/2022
30665 73280 31/08/2022 30/09/2022
30665 73280 30/09/2022 30/09/2023
30665 73280 03/10/2023 31/10/2023
30665 73280 31/10/2023 01/12/2023
30665 79621 01/12/2023 04/12/2023
30665 79621 01/12/2023 04/12/2023
30665 73280 04/12/2023 15/01/2024
30665 73280 15/01/2024 15/03/2024
30665 73280 15/03/2024 14/05/2024
30665 73280 14/05/2024 15/05/2024
30665 73280 15/05/2024 17/05/2024

Here's the query I have in my attempt to get what I need:

SELECT
    EMPLOYEE_ID,
    SUPERVISOR_ID,
    MIN(EFFECTIVE_START) AS "EFFECTIVE_START",
    MAX(EFFECTIVE_END) AS "EFFECTIVE_END"
FROM
    EMP_TABLE
GROUP BY
    1, 2
ORDER BY
    EFFECTIVE_START ASC

Which resulted in this output; my problem here is the 73280 earn both the min and max:

EMPLOYEE_ID SUPERVISOR_ID EFFECTIVE_START EFFECTIVE_END
30665 61046 30/04/2021 30/09/2021
30665 8009098 30/09/2021 31/07/2022
30665 73280 31/07/2022 01/12/2023
30665 73280 31/07/2022 17/05/2024
30665 79621 01/12/2023 04/12/2023

But my ideal result is this one:

EMPLOYEE_ID SUPERVISOR_ID EFFECTIVE_START EFFECTIVE_END
30665 61046 30/04/2021 30/09/2021
30665 8009098 30/09/2021 31/07/2022
30665 73280 31/07/2022 01/12/2023
30665 79621 01/12/2023 04/12/2023
30665 73280 04/12/2023 17/05/2024

Solution

  • You need to break the grouping whenever there's a gap or change in the effective date ranges for the same supervisor :

    WITH SupervisorGroups AS (
        SELECT
            EMPLOYEE_ID,
            EMPLOYEE_SUPERVISOR_ID,
            EFFECTIVE_START_DATE,
            EFFECTIVE_END_DATE,
            SUM(CASE 
                    WHEN LAG(EMPLOYEE_SUPERVISOR_ID) OVER (PARTITION BY EMPLOYEE_ID ORDER BY EFFECTIVE_START_DATE) = EMPLOYEE_SUPERVISOR_ID 
                         AND LAG(EFFECTIVE_END_DATE) OVER (PARTITION BY EMPLOYEE_ID ORDER BY EFFECTIVE_START_DATE) = EFFECTIVE_START_DATE 
                    THEN 0 
                    ELSE 1 
                END) OVER (PARTITION BY EMPLOYEE_ID ORDER BY EFFECTIVE_START_DATE) AS GroupFlag
        FROM EMP_TABLE
    )
    SELECT
        EMPLOYEE_ID,
        EMPLOYEE_SUPERVISOR_ID AS SUPERVISOR_ID,
        MIN(EFFECTIVE_START_DATE) AS EFFECTIVE_START,
        MAX(EFFECTIVE_END_DATE) AS EFFECTIVE_END
    FROM SupervisorGroups
    GROUP BY EMPLOYEE_ID, EMPLOYEE_SUPERVISOR_ID, GroupFlag
    ORDER BY EFFECTIVE_START ASC;