Search code examples
pythonpandasdatetimestamputc

How to get total hours during working hours?


I have two columns in pandas that are Timestamps as UTC: StartDate and EndDate. The dates are for the task that was submitted for help. I am trying to calculate how many hours did these task exist until finished during work hours. My company is open from 8 am to 6 pm PST Monday-Friday.

I have attempted to make these calculations but I have run into a problem. I don't know how to only get the hours during work time. I have figured out how to the the hours overthe entire time period, but I don't know how to move forward with this. My current thought is to make an if statment in a for loop and check each value of the SLATable['Date Responded (Hours)'] column that is generated from my calculations, but last time I tried to edit value is in column indiviually python throwed me an error.

#8-6 PST to UTC time
officeOpen = pd.Timestamp("8:00:00.000").tz_localize('US/Pacific').tz_convert('utc')
officeClose = pd.Timestamp("18:00:00.000").tz_localize('US/Pacific').tz_convert('utc')

#get data from sql server
SLATable = pd.read_sql_query(SqlQuery,conn)

#calculate Date Responded
SLATable['Date Responded (Hours)'] = SLATable['EndDate'] - SLATable['StartDate']
SLATable['Date Responded (Hours)'] = round(SLATable['Date Responded (Hours)']/np.timedelta64(1,'h'), 2)

Currently, if I use the code above it would work for any task that was finished on the same day as it was created if it was created during working hours, but task created Monday and was finished Tuesday would have hours outside of work hours. Also if a task is created outside of office hours it would collect time untiled we addressed it during working hours.

These calculations are not designed to take in holidays of any country just Monday to Friday from 8 to 6.

example data if run through my calculation:

StartDate       EndDate       Date Responded (Hours)
2016-05-03      2016-05-03    0.13
15:51:11.850    15:59:13.017

2016-05-05      2016-05-06    17.64
23:01:51.023    16:40:21.350

what the output should be if calculated correctly:

StartDate       EndDate       Date Responded (Hours)
2016-05-03      2016-05-03    0.13
15:51:11.850    15:59:13.017

2016-05-05      2016-05-06    0.32
23:01:51.023    16:40:21.350

Solution

  • The first step is to define a BusinessHour offset, with proper start / end hours (it will be used later):

    bhOffs = pd.offsets.BusinessHour(start='08:00', end='18:00')
    

    Then define a function computing business time from UTC time, with proper TimeZone offset:

    def BusTime(ts, hOffs, fwd):
        '''Compute business time. Params:
        ts    - UTC time (string or Timestamp)
        hOffs - Hour offset (int)
        fwd   - Roll variant (forward / backward, bool)
        '''
        tsWrk = ts if type(ts) == 'str' else pd.Timestamp(ts)
        tsOffs = tsWrk + np.timedelta64(hOffs, 'h')
        if fwd:  # Roll if on End of Day
            tsRoll = bhOffs.rollforward(tsOffs + np.timedelta64(1, 'ms'))
        else:    # Don't roll if on End of Day
            tsRoll = bhOffs.rollforward(tsOffs - np.timedelta64(1, 'ms'))
        return tsRoll if tsRoll.day != tsOffs.day else tsOffs
    

    And as the last step, define a function computing business hours:

    def BusHrs(ts1, ts2, hOffs=0):
        '''Compute business hours between 2 DateTimes. Params:
        ts1, ts2 - From / To (UTC, Timestamp or string)
        hOffs    - Hour offset (int)
        '''
        t1 = BusTime(ts1, hOffs, True)
        t2 = BusTime(ts2, hOffs, False)
        bHrs = pd.date_range(start=t1.floor('h'), end=t2.floor('h'),
            freq=bhOffs, closed='left').size
        frac1 = t1 - t1.normalize() - np.timedelta64(t1.hour, 'h')
        frac2 = t2 - t2.normalize() - np.timedelta64(t2.hour, 'h')
        return bHrs + (frac2 - frac1) / np.timedelta64(1, 'h')
    

    The idea is to:

    • Convert UTC start / end hours to proper TimeZone.
    • Generate a DatetimeIndex between local start / end times, rounded down to full hours.
    • Take the number of full hours from the size of this index.
    • Correct by fractional parts of hour, from start / end time.

    I performed the test on your data:

    • BusHrs('2016-05-03 15:51:11.850', '2016-05-03 15:59:13.017', -7) - result 0.1336575.
    • BusHrs('2016-05-05 23:01:51.023', '2016-05-06 16:40:21.350', -7) - result 3.6417574999999998.

    The second result is different from your expected result, but the rationale is as follows:

    • Start time: 2016-05-05 23:01 UTC is 2016-05-05 16:01 (Pacific).
    • End time: 2016-05-06 16:40 UTC is 2016-05-06 09:40 (Pacific).
    • Work time on 2016-05-05 is almost 2 h (up to 18:00).
    • Work time on 2016-05-06 is almost 1 h 40 m (from 8:00).
    • Sum of both work times is just 3.64....

    I didn't test this function on your third set of start / end times, as there is probably something wrong with it (the actual work time is far above your expected result).