Search code examples
sqlpyspark

Calculations business hours per day - sql


Good morning,

I have a table where each item has an "initial delivery time" and a "final delivery time." I need to calculate how many working hours each delivery took and then aggregate (by the final delivery time) it to see the average working hours my shipments are taking to be delivered each day.

For our business, we consider "business hours from Monday to Friday" as 9 a.m. to 6 p.m. (18:00).

It is possible that the initial time of a shipment is scheduled for any hour outside the business hours. In this case, we must begin counting only when we enter the business hours. For example:

If a shipment's initial delivery is set for Thursday 7 p.m. and the final time ends up being Friday 10 a.m., we should consider only 1 hour (from Friday 9 a.m. to Friday 10 a.m., knowing that from Thursday 7 p.m. to Friday 9 a.m. was outside working hours).

Tried to do the query below - but it doesn’t respect all ours business rules (its not aggregating yet)

Unfortunately here in the office we cannot use “variables” in SQL (we use something called Metabase - what makes it harder

Is anybody up to the challenge of helping me?

SELECT 
    SUM(
        TIMESTAMPDIFF(
            HOUR, 
            GREATEST(data_inicial, TIMESTAMP(DATE(data_inicial), '09:00:00')), 
            LEAST(data_final, TIMESTAMP(DATE(data_final), '18:00:00'))
        )
    ) AS hours
FROM table
WHERE WEEKDAY(data_inicial) BETWEEN 0 AND 4
  AND HOUR(data_inicial) < 18
  AND WEEKDAY(data_final) BETWEEN 0 AND 4
  AND HOUR(data_final) >= 9
UNION
SELECT 
    COUNT(*) * 9 AS hours
FROM table
WHERE WEEKDAY(data_inicial) BETWEEN 0 AND 4
  AND HOUR(data_inicial) >= 9
UNION
SELECT 
    COUNT(*) * 9 AS horas_uteis
FROM table
WHERE WEEKDAY(data_final) BETWEEN 0 AND 4
  AND HOUR(data_final) < 18;

Solution

  • For this task, I think, is well to use calendar table. In example, this calendar table created temporarily as CTE.
    You can use permanent table with index,for example on (date,startH,endH,workDay). That table is easy to maintain. Work calendar is helpful for setting any day attributes, like holidays and etc.

    In example aggregate made by the final delivery date

    See example:

    with recursive WorkCalendar as( -- work calendar, sufficient for test data
      select 1 n
        ,date('2023-10-23') dt
        ,case when weekday(date('2023-10-23')) between 0 and 4 then 1 else 0 end workDay
        ,case when weekday(date('2023-10-23')) between 0 and 4 then 9 else 0 end startH
        ,case when weekday(date('2023-10-23')) between 0 and 4 then 18 else 0 end endH
        ,weekday(date('2023-10-23')) wd
        ,week(date('2023-10-23')) w
        ,dayname(date('2023-10-23')) dn
    union all
      select  n+1
        ,date_add(dt, interval 1 day)  
        ,case when weekday(date_add(dt, interval 1 day)) between 0 and 4 then 1 else 0 end workDay
        ,case when weekday(date_add(dt, interval 1 day)) between 0 and 4 then 9 else 0 end startH
        ,case when weekday(date_add(dt, interval 1 day)) between 0 and 4 then 18 else 0 end endH
        ,weekday(date_add(dt, interval 1 day)) wd
        ,week(date_add(dt, interval 1 day)) w
        ,dayname(date_add(dt, interval 1 day)) dn
      from WorkCalendar where date_add(dt, interval 1 day)<date('2023-11-23')
    )
    ,ShipmentDt as( -- subquery for to shorten formulas
    select * 
      ,hour(data_inicial) h1 ,cast(data_inicial as date) dt1
      ,hour(data_final) h2 ,cast(data_final as date) dt2
    from Shipment
    )
    ,ShipmentHours as(
    select id,data_inicial,data_final,c.dt,c.workDay,c.startH,c.endH 
         -- hours in data_inicial
          case when dt2>dt1 then  
            endH-case when h1<startH then startH when h1>endH then endH else h1  end
          else -- intraday
            case when h2<startH then startH when h2>endH then endH else h2  end
           -case when h1<startH then startH when h1>endH then endH else h1  end
          end 
        else 0
        end hours_first
        -- hours in data_final
       ,case when dt2>dt1 and c.dt=dt2 then
            case when h2<startH then 0 when h2>endH then (endH-startH) else h2-startH  end
        else 0
        end hours_last
        -- hours in data between (not inculded) data_inicial and data_final
       ,case when c.dt>dt1 and c.dt<dt2 then
            endH-startH
        else 0
        end hours_inter
    from ShipmentDt sd
    left join WorkCalendar c on c.dt>=sd.dt1 and c.dt<=sd.dt2
    )
    ,ShipmentWorkHours as( -- total work hours for shipment
    select id,min(data_inicial) data_inicial,min(data_final)data_final
      ,sum((hours_first+hours_last+hours_inter)*workDay) as totH
    from ShipmentHours
    group by id
    )
    ,AvgDelivery as( -- average work hours, delivered in date - target task
    select cast(data_final as date) data_final, avg(totH) avgH,count(*) cnt
    from ShipmentWorkHours
    group by cast(data_final as date)
    )
    ,ActiveShipments as( -- additional - count of active shipments by date
    select dt,count(*) cnt
    from ShipmentHours
    group by dt
    )
    ,ActiveDeliveryByDate as( -- additional - active shipments list for date
    select dt,id,data_inicial,data_final
    from ShipmentHours
    where dt=cast('2023-10-25' as date)
    )
     select * from AvgDelivery
    -- select * from ActiveShipments
    -- select * from ActiveDeliveryByDate
    -- select * from ShipmentWorkHours 
    

    Query result with sample data

    data_final avgH cnt
    2023-10-23 9.0000 3
    2023-10-26 14.8000 5
    2023-10-27 17.0000 1
    2023-10-24 8.6667 6
    2023-10-28 36.0000 1
    2023-10-29 37.0000 1
    2023-10-30 46.0000 1
    2023-10-31 55.0000 1
    2023-11-05 82.0000 1
    2023-11-06 90.0000 1

    ActiveShipments example

    dt cnt
    2023-10-23 15
    2023-10-26 12
    2023-10-25 11
    2023-10-27 7
    2023-10-24 12
    2023-10-28 6
    2023-10-29 5
    2023-10-30 4
    2023-10-31 3
    2023-11-05 2
    2023-11-04 2
    2023-11-03 2
    2023-11-02 2
    2023-11-01 2
    2023-11-06 1

    ActiveShipmnetsByDate example

    dt id data_inicial data_final
    2023-10-25 14 2023-10-25 11:00:00 2023-10-26 13:00:00
    2023-10-25 15 2023-10-25 11:00:00 2023-10-26 18:00:00
    2023-10-25 16 2023-10-25 11:00:00 2023-10-26 20:00:00
    2023-10-25 17 2023-10-25 11:00:00 2023-10-26 16:00:00
    2023-10-25 18 2023-10-25 09:00:00 2023-10-26 17:00:00
    2023-10-25 33 2023-10-23 19:00:00 2023-10-28 19:00:00
    2023-10-25 34 2023-10-23 17:00:00 2023-10-29 19:00:00
    2023-10-25 35 2023-10-23 17:00:00 2023-10-30 19:00:00
    2023-10-25 36 2023-10-23 17:00:00 2023-10-31 19:00:00
    2023-10-25 37 2023-10-23 17:00:00 2023-11-05 19:00:00
    2023-10-25 38 2023-10-23 17:00:00 2023-11-06 17:00:00

    WorkCalendar example

    n dt workDay startH endH wd w dn
    1 2023-10-23 1 9 18 0 43 Monday
    2 2023-10-24 1 9 18 1 43 Tuesday
    3 2023-10-25 1 9 18 2 43 Wednesday
    4 2023-10-26 1 9 18 3 43 Thursday
    5 2023-10-27 1 9 18 4 43 Friday
    6 2023-10-28 0 0 0 5 43 Saturday
    7 2023-10-29 0 0 0 6 44 Sunday
    8 2023-10-30 1 9 18 0 44 Monday

    For demo, I'll use sample data

    Create table Shipment (id int,data_inicial datetime,data_final datetime);
    insert into Shipment values
     (11,'2023-10-23 09:00:00','2023-10-23 17:00:00')
    ,(12,'2023-10-23 08:00:00','2023-10-23 12:00:00')
    ,(13,'2023-10-23 08:00:00','2023-10-23 19:00:00')
    ,(14,'2023-10-25 11:00:00','2023-10-26 13:00:00')
    ,(15,'2023-10-25 11:00:00','2023-10-26 18:00:00')
    ,(16,'2023-10-25 11:00:00','2023-10-26 20:00:00')
    ,(17,'2023-10-25 11:00:00','2023-10-26 16:00:00')
    ,(18,'2023-10-25 09:00:00','2023-10-26 17:00:00')
    ,(19,'2023-10-26 09:00:00','2023-10-27 17:00:00')
    ,(21,'2023-10-23 08:00:00','2023-10-24 17:00:00')
    ,(22,'2023-10-23 17:00:00','2023-10-24 17:00:00')
    ,(23,'2023-10-23 17:00:00','2023-10-24 19:00:00')
    ,(24,'2023-10-23 19:00:00','2023-10-24 09:00:00')
    ,(31,'2023-10-23 19:00:00','2023-10-24 17:00:00')
    ,(32,'2023-10-23 19:00:00','2023-10-24 17:00:00')
    ,(33,'2023-10-23 19:00:00','2023-10-28 19:00:00')
    ,(34,'2023-10-23 17:00:00','2023-10-29 19:00:00')
    ,(35,'2023-10-23 17:00:00','2023-10-30 19:00:00')
    ,(36,'2023-10-23 17:00:00','2023-10-31 19:00:00')
    ,(37,'2023-10-23 17:00:00','2023-11-05 19:00:00')
    ,(38,'2023-10-23 17:00:00','2023-11-06 17:00:00')
    ;
    

    Demo here