Search code examples
pythonpandasdataframedatetimetimedelta

Find change in time in a pandas data frame


I have the following lists extracted from a Pandas dataframe. It's basically a start date and time and an end date and time and I want to find the difference between them.

start_date  = ['29.12.2020', '29.12.2020', '30.12.2020', '30.12.2020', '30.12.2020']    
start_time = [datetime.time(11, 10), datetime.time(23, 15), datetime.time(5, 15), datetime.time(11, 15), datetime.time(17, 15)]

end_date = ['29.12.2020', '30.12.2020', '30.12.2020', '30.12.2020', '30.12.2020']
end_time = [datetime.time(23, 15), datetime.time(5, 15), datetime.time(11, 15), datetime.time(17, 15), datetime.time(23, 15)]

So, I would like to join the dates and times to have one start point and one end point in time and find the difference between both of these in a HH:MM format.

For example, if we take the first row, the difference between both points in time should be 12:05 (12 hours and 5 minutes)


Solution

  • start_date  = ['29.12.2020', '29.12.2020', '30.12.2020', '30.12.2020', '30.12.2020']
    start_time = [datetime.time(11, 10), datetime.time(23, 15), datetime.time(5, 15), datetime.time(11, 15), datetime.time(17, 15)]
    end_date = ['29.12.2020', '30.12.2020', '30.12.2020', '30.12.2020', '30.12.2020']
    end_time = [datetime.time(23, 15), datetime.time(5, 15), datetime.time(11, 15), datetime.time(17, 15), datetime.time(23, 15)]
    
    df = pd.DataFrame(data={'start_date': start_date,
                            'start_time': start_time,
                            'end_date': end_date,
                            'end_time': end_time})
    
    df['start_date_time'] = pd.to_datetime(df['start_date'] + ' ' + df['start_time'].astype(str))
    df['end_date_time'] = pd.to_datetime(df['end_date'] + ' ' + df['end_time'].astype(str))
    
    df['diff'] = (df['end_date_time'] - df['start_date_time'])
    df['hours'] = df['diff']/ np.timedelta64(1, 'h')
    df['HH:MM'] = df['hours'].astype(int).astype(str) + ':' + ((df['diff']/ np.timedelta64(1, 'm')) - (df['hours'].astype(int)*60)).astype(int).astype(str)
    print(df[['start_date_time', 'end_date_time', 'HH:MM']])
    

    Output:

          start_date_time       end_date_time HH:MM
    0 2020-12-29 11:10:00 2020-12-29 23:15:00  12:5
    1 2020-12-29 23:15:00 2020-12-30 05:15:00   6:0
    2 2020-12-30 05:15:00 2020-12-30 11:15:00   6:0
    3 2020-12-30 11:15:00 2020-12-30 17:15:00   6:0
    4 2020-12-30 17:15:00 2020-12-30 23:15:00   6:0