Search code examples
sqlsnowflake-cloud-data-platformdbt

Calculating split labor hours?


I need to calculate split labor hours for some reports and have been struggling with it. Our Oracle database currently does this using a stored procedure in a package, but I am working within the confines of Snowflake/dbt for this project and while I could just bring the data from a view in Oracle over to Snowflake, I'm wanting to do some transformations on the time prior to recalculating the split time, as we frequently have people forgetting to clock out and I want to correct these to the end of their shift. So I have to recalculate split time after adjusting the erroneous clock outs.

My data looks like this:

emp_id task_id time_in time_out
43 159221 2021-09-30 09:00:00 2021-09-30 14:30:00
43 188239 2021-09-30 09:16:03 2021-09-30 10:44:51
43 177741 2021-09-30 13:12:49 2021-09-30 14:14:32

Stopping here and summing up the hours gets:

emp_id task_id hrs
43 159221 5.5
43 188239 1.47
43 177741 1.03

This is wrong because some of the time spent was shared with other tasks and should be split.

What I need to do is if we were calculating the total hours for the first task, this would be the formula:

.267 + (1.47 / 2) + 2.47 + (1.03 / 2) + .27 = 4.257 hours worked on task_id 159221.

Is there any way to handle this in SQL or will I have to learn JavaScript and write a stored procedure?

Additional details:

  • The # of tasks that can overlap is variable, it's not only up to 2.
  • Employees may clock into the same task multiple times throughout the day
  • Multiple different employees may be clocked in the same task simultaneously.

Solution

  • You may try the following.

    A LEFT JOIN was used to determine overlapping duration before aggregating the results to subtract the total overlapping duration from the actual task duration.

    SELECT
         emp_id,
         task_id,
         MAX(task_duration) - SUM(overlap_duration/2) as hours
    FROM (
        SELECT
            t1.emp_id,
            t1.task_id,
            t1.time_in,
            t1.time_out,
            (datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
             CASE
                 WHEN t2.task_id IS NULL THEN 0
                 ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
             END as overlap_duration,
             t2.task_id as overlap_task
        FROM
            task_times t1
        LEFT JOIN
            task_times t2 ON t1.emp_id = t2.emp_id AND
                             t1.task_id < t2.task_id AND
                             (
                          (t1.time_in between t2.time_in and t2.time_out) or
      (t1.time_out between t2.time_in and t2.time_out) or
      (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
      (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                          )
                          
    ) t3
    GROUP BY
        emp_id, task_id;
    
    emp_id task_id hours
    43 159221 4.245694444444444
    43 177741 1.0286111111111111
    43 188239 1.48

    View working demo on DB Fiddle

    I've also included the output of the subquery below for your perusal

    SELECT
            t1.emp_id,
            t1.task_id,
            t1.time_in,
            t1.time_out,
            (datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
             CASE
                 WHEN t2.task_id IS NULL THEN 0
                 ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
             END as overlap_duration,
             t2.task_id as overlap_task
        FROM
            task_times t1
        LEFT JOIN
            task_times t2 ON t1.emp_id = t2.emp_id AND
                             t1.task_id < t2.task_id AND
                             (
                          (t1.time_in between t2.time_in and t2.time_out) or
      (t1.time_out between t2.time_in and t2.time_out) or
      (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
      (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                          );
    
    emp_id task_id time_in time_out task_duration overlap_duration overlap_task
    43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.48 188239
    43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.0286111111111111 177741
    43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:44:51.000Z 1.48 0
    43 177741 2021-09-30T13:12:49.000Z 2021-09-30T14:14:32.000Z 1.0286111111111111 0

    View demo on DB Fiddle

    Edit 1

    Responding to question update

    Additional details:

    1. The # of tasks that can overlap is variable, it's not only up to 2.
    2. Employees may clock into the same task multiple times throughout the day

    I've added more sample data and provided an updated fiddle. I'm unable to provide a demo fiddle with snowflake but we can change the EXTRACT to DATEDIFF as shown earlier.

    CREATE TABLE task_times (
      emp_id INTEGER,
      task_id INTEGER,
      time_in TIMESTAMP,
      time_out TIMESTAMP
    );
    
    INSERT INTO task_times
      (emp_id, task_id, time_in, time_out)
    VALUES
    -- add repetition of the same tasks with overlaps
      ('43', '159221', '2021-09-30 09:00:00', '2021-09-30 14:30:00'),
      ('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:27:00'),
      ('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:38:00'),
    --- add overlap of different tasks
      ('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:44:51'),
      ('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:43:51'),
      ('43', '177741', '2021-09-30 13:12:49', '2021-09-30 14:14:32'),
    -- add multiple overalapping tasks with a difference of 10 minutes
      ('43', '333333', '2021-10-30 09:00:00', '2021-10-30 12:40:00'),
      ('43', '333334', '2021-10-30 09:00:00', '2021-10-30 12:30:00'),
      ('43', '333335', '2021-10-30 09:00:00', '2021-10-30 12:20:00'),
      ('43', '333336', '2021-10-30 09:00:00', '2021-10-30 12:10:00'),
    -- add multiple log in and log out times with no overlaps
      ('43', '333337', '2021-10-30 09:00:00', '2021-10-30 12:00:00'),
      ('43', '333337', '2021-10-30 12:30:00', '2021-10-30 13:00:00'),
      ('43', '333337', '2021-10-30 13:30:00', '2021-10-30 14:00:00')
      ;
    

    Subquery to determine overlaps (for debugging purposes)

    SELECT
            t1.emp_id,
            t1.task_id,
            t1.time_in,
            t1.time_out,
            (EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
             CASE
                 WHEN t2.task_id IS NULL THEN 0
                 ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
             END as overlap_duration,
             t2.task_id as overlap_task
        FROM
            task_times t1
        LEFT JOIN
            task_times t2 ON t1.emp_id = t2.emp_id AND
                             t1.task_id < t2.task_id AND
                             (
                          (t1.time_in between t2.time_in and t2.time_out) or
      (t1.time_out between t2.time_in and t2.time_out) or
      (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
      (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                          );
    
    emp_id task_id time_in time_out task_duration overlap_duration overlap_task
    43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.48 188239
    43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.4633333333333334 188239
    43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.0286111111111111 177741
    43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.48 188239
    43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.4633333333333334 188239
    43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.0286111111111111 177741
    43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.48 188239
    43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.4633333333333334 188239
    43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.0286111111111111 177741
    43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:44:51.000Z 1.48 0
    43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:43:51.000Z 1.4633333333333334 0
    43 177741 2021-09-30T13:12:49.000Z 2021-09-30T14:14:32.000Z 1.0286111111111111 0
    43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.5 333334
    43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.3333333333333335 333335
    43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.1666666666666665 333336
    43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3 333337
    43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 0.5 333337
    43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3.3333333333333335 333335
    43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3.1666666666666665 333336
    43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3 333337
    43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 0.5 333337
    43 333335 2021-10-30T09:00:00.000Z 2021-10-30T12:20:00.000Z 3.3333333333333335 3.1666666666666665 333336
    43 333335 2021-10-30T09:00:00.000Z 2021-10-30T12:20:00.000Z 3.3333333333333335 3 333337
    43 333336 2021-10-30T09:00:00.000Z 2021-10-30T12:10:00.000Z 3.1666666666666665 3 333337
    43 333337 2021-10-30T09:00:00.000Z 2021-10-30T12:00:00.000Z 3 0
    43 333337 2021-10-30T12:30:00.000Z 2021-10-30T13:00:00.000Z 0.5 0
    43 333337 2021-10-30T13:30:00.000Z 2021-10-30T14:00:00.000Z 0.5 0

    Actual Query (I've only including the mins column for debugging purposes)

    SELECT
         emp_id,
         task_id,
         MAX(CASE
             WHEN overlap_duration>0 THEN task_duration
             ELSE 0
         END ) + SUM(CASE
             WHEN overlap_duration=0 THEN task_duration
             ELSE 0
         END) - AVG(overlap_duration) as hrs,
         (MAX(CASE
             WHEN overlap_duration>0 THEN task_duration
             ELSE 0
         END ) + SUM(CASE
             WHEN overlap_duration=0 THEN task_duration
             ELSE 0
         END) - AVG(overlap_duration))*60 as mins
    FROM (
        SELECT
            t1.emp_id,
            t1.task_id,
            t1.time_in,
            t1.time_out,
            (EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
             CASE
                 WHEN t2.task_id IS NULL THEN 0
                 ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
             END as overlap_duration,
             t2.task_id as overlap_task
        FROM
            task_times t1
        LEFT JOIN
            task_times t2 ON t1.emp_id = t2.emp_id AND
                             t1.task_id < t2.task_id AND
                             (
                          (t1.time_in between t2.time_in and t2.time_out) or
      (t1.time_out between t2.time_in and t2.time_out) or
      (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
      (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                          )
                          
    ) t3
    GROUP BY
        emp_id, task_id
    ORDER BY
        emp_id, task_id;
    
    emp_id task_id hrs mins
    43 159221 4.259351851851852 255.5611111111111
    43 177741 1.0286111111111111 61.71666666666667
    43 188239 2.9433333333333334 176.6
    43 333333 0.9666666666666663 57.99999999999998
    43 333334 1 60
    43 333335 0.25000000000000044 15.000000000000027
    43 333336 0.16666666666666652 9.999999999999991
    43 333337 4 240

    View on DB Fiddle

    Let me know if this works for you.