Search code examples
sumteradata

Teradata - Group By consecutive dates in a date range without a common ID to group by


I am trying to sum the days on the job by employee:

  • When on_job_ind = Y
  • NOT total days on job for employee, but by date groupings
  • There is not any kind of job_id to GROUP BY, because that would make my life too easy

As you can see in the table below, my SUM() is summing everthing where on_job_ind = 'Y'. The last column in the table below, correct_sum is how the days_on_job need to be totaled up by date groups.

I do not know how to "break" the summation by date groups when encountering a row where on_job_ind = 'N'.

I have chased my tail using LAG and LEAD to no avail.

Is what I'm trying to do possible in SQL?

My SQL thus far:


SELECT emp_nbr
       , job_start_date AS act_job_start_dt
       , job_end_dt AS act_job_end_dt
       , MIN(job_start_date) over (PARTITION BY emp_nbr, on_job_ind ORDER BY job_end_dt ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS min_job_start_dt
       , MAX(job_end_dt) over (PARTITION BY emp_nbr, on_job_ind ORDER BY job_end_dt ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS max_job_end_dt
       , on_job_ind
       , days_on_job
       , SUM(days_on_job) over (PARTITION BY emp_nbr, on_job_ind ORDER BY job_end_dt ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_days_on_job
  FROM emp_job_tbl a
 ORDER BY 1, 3 DESC

Table data:

emp_nbr act_job_start_dt act_job_end_dt min_job_start_dt min_job_end_dt on_job_ind days_on_job total_days_on_job correct_sum
15968 2/19/2023 3/3/2023 3/2/2022 3/3/2023 Y 12 266 12
15968 2/15/2023 2/19/2023 3/16/2022 2/19/2023 N 0 0
15968 2/1/2023 2/15/2023 3/2/2022 2/15/2023 Y 14 254 28
15968 1/18/2023 2/1/2023 3/2/2022 2/1/2023 Y 14 240
15968 1/7/2023 1/18/2023 3/16/2022 1/18/2023 N 0 0
15968 1/2/2023 1/7/2023 3/2/2022 1/7/2023 Y 5 226 18
15968 12/30/2022 1/2/2023 3/2/2022 1/2/2023 Y 3 221
15968 12/20/2022 12/30/2022 3/2/2022 12/30/2022 Y 10 218
15968 12/19/2022 12/20/2022 3/16/2022 12/20/2022 N 0 0
15968 12/16/2022 12/19/2022 3/2/2022 12/19/2022 Y 3 208 30
15968 12/1/2022 12/16/2022 3/2/2022 12/16/2022 Y 15 205
15968 11/19/2022 12/1/2022 3/2/2022 12/1/2022 Y 12 190
15968 11/1/2022 11/19/2022 3/16/2022 11/19/2022 N 0 0
15968 10/24/2022 11/1/2022 3/16/2022 11/1/2022 N 0 0
15968 10/7/2022 10/24/2022 3/16/2022 10/24/2022 N 0 0
15968 10/2/2022 10/7/2022 3/2/2022 10/7/2022 Y 5 178 19
15968 9/18/2022 10/2/2022 3/2/2022 10/2/2022 Y 14 173
15968 9/14/2022 9/18/2022 3/16/2022 9/18/2022 N 0 0
15968 9/1/2022 9/14/2022 3/2/2022 9/14/2022 Y 13 159 57
15968 8/18/2022 9/1/2022 3/2/2022 9/1/2022 Y 14 146
15968 8/1/2022 8/18/2022 3/2/2022 8/18/2022 Y 17 132
15968 7/19/2022 8/1/2022 3/2/2022 8/1/2022 Y 13 115
15968 7/1/2022 7/19/2022 3/16/2022 7/19/2022 N 0 0
15968 6/30/2022 7/1/2022 3/16/2022 7/1/2022 N 0 0
15968 6/20/2022 6/30/2022 3/2/2022 6/30/2022 Y 10 102 10
15968 6/11/2022 6/20/2022 3/16/2022 6/20/2022 N 0 0
15968 6/9/2022 6/11/2022 3/16/2022 6/11/2022 N 0 0
15968 6/1/2022 6/9/2022 3/2/2022 6/9/2022 Y 8 92 53
15968 5/30/2022 6/1/2022 3/2/2022 6/1/2022 Y 2 84
15968 5/19/2022 5/30/2022 3/2/2022 5/30/2022 Y 11 82
15968 5/2/2022 5/19/2022 3/2/2022 5/19/2022 Y 17 71
15968 4/17/2022 5/2/2022 3/2/2022 5/2/2022 Y 15 54
15968 4/14/2022 4/17/2022 3/16/2022 4/17/2022 N 0 0
15968 3/30/2022 4/14/2022 3/2/2022 4/14/2022 Y 15 39 25
15968 3/20/2022 3/30/2022 3/2/2022 3/30/2022 Y 10 24
15968 3/16/2022 3/20/2022 3/16/2022 3/20/2022 N 0 0
15968 3/2/2022 3/16/2022 3/2/2022 3/16/2022 Y 14 14 14

Solution

  • The trick to this one is that you actually have to consolidate 2 of the rows based on the fact that one is ending at the same time another one is beginning.

    What I do is start by flagging a row for whether or not is it a "start" of a group or not. In this case, I used a window function. (Line 14)

    If for some reason you can't get this to work with Teradata, please just message me because I know I've solved this on Teradata before. I no longer have access to Teradata so I tried my best. But I have solved this exact problem on Teradata many times so I know it's possible.

    Here is the code that gives the result you're looking for, using Snowflake syntax. Hopefully only minor adjustments are needed.

    For more information about solving this problem, the keywords to google would be "Consolidating overlapping time ranges" or the "Gaps and Islands Problem."

    WITH VALID_DATA AS (
      SELECT 
        * 
      FROM 
        RASGO.STACKOVERFLOW_Q75700970 
      WHERE 
        ON_JOB_IND = 'Y'
    ), 
    GROUP_STARTS AS (
      SELECT 
        EMP_NBR, 
        ACT_JOB_START_DT, 
        ACT_JOB_END_DT, 
        CASE WHEN ACT_JOB_START_DT > max(ACT_JOB_END_DT) OVER (
          PARTITION BY EMP_NBR 
          ORDER BY 
            ACT_JOB_START_DT, 
            ACT_JOB_END_DT ROWS BETWEEN UNBOUNDED preceding 
            AND 1 preceding
        ) THEN 1 ELSE 0 END GROUP_START 
      FROM 
        VALID_DATA
    ), 
    GROUPS AS (
      SELECT 
        EMP_NBR, 
        ACT_JOB_START_DT, 
        ACT_JOB_END_DT, 
        sum(GROUP_START) OVER (
          PARTITION BY EMP_NBR 
          ORDER BY 
            ACT_JOB_START_DT, 
            ACT_JOB_END_DT
        ) grp 
      FROM 
        GROUP_STARTS
    ), 
    CONSOLIDATED AS (
      SELECT 
        EMP_NBR, 
        min(ACT_JOB_START_DT) GROUP_START_DT, 
        max(ACT_JOB_END_DT) GROUP_END_DT 
      FROM 
        GROUPS 
      GROUP BY 
        EMP_NBR, 
        grp
    ) 
    SELECT 
      *, 
      DATEDIFF(
        'DAY', GROUP_START_DT, GROUP_END_DT
      ) AS total_days_on_job 
    FROM 
      CONSOLIDATED 
    ORDER BY 
      2
    

    Result:

    
    +---------+----------------+--------------+-------------------+
    | EMP_NBR | GROUP_START_DT | GROUP_END_DT | TOTAL_DAYS_ON_JOB |
    +---------+----------------+--------------+-------------------+
    |   15968 | 3/2/22         | 3/16/22      |                14 |
    |   15968 | 3/20/22        | 4/14/22      |                25 |
    |   15968 | 4/17/22        | 6/9/22       |                53 |
    |   15968 | 6/20/22        | 6/30/22      |                10 |
    |   15968 | 7/19/22        | 9/14/22      |                57 |
    |   15968 | 9/18/22        | 10/7/22      |                19 |
    |   15968 | 11/19/22       | 12/19/22     |                30 |
    |   15968 | 12/20/22       | 1/7/23       |                18 |
    |   15968 | 1/18/23        | 2/15/23      |                28 |
    |   15968 | 2/19/23        | 3/3/23       |                12 |
    +---------+----------------+--------------+-------------------+