Search code examples
pythonpandaspython-datetimetimedelta

Compare two time columns, then return minimum time difference in hh:mm:ss in new column in python dataframe


from datetime import datetime
start='12:25:03'
format = '%H:%M:%S'
startDateTime = datetime.strptime(start, format)
end='12:30:40'
endDateTime = datetime.strptime(end, format)
diff = endDateTime - startDateTime
print(diff)
0:05:37

Above code works fine, but when I apply the same to entire column using lambda function I get result in different format, I would like to get values of Diff column in hh:mm:ss format.

t1 - Object type 
t2 - Object type 
Diff - timedelta64[ns] type



 df["Diff"] = df.apply(lambda x: datetime.strptime(x["t1"], format) - datetime.strptime(x["t2"], format), axis = 1)
 df.head()
           t1          t2             Diff
    0   01:27:19    01:28:58    -1 days +23:58:21
    1   01:49:57    01:50:40    -1 days +23:59:17
    2   03:35:24    03:36:14    -1 days +23:59:10

Solution

  • related: Format timedelta to string

    You can write your own formatter for the timedelta objects, e.g.

    def formatTimedelta(td):
        """
        format a timedelta object to string, in HH:MM:SS format (seconds floored).
        negative timedeltas will be prefixed with a minus, '-'.
        """
        total = td.total_seconds()
        prefix, total = ('-', total*-1) if total < 0 else ('', total)
        h, r = divmod(total, 3600)
        m, s = divmod(r, 60)
        return f"{prefix}{int(h):02d}:{int(m):02d}:{int(s):02d}"
    

    which would give you for the example df

    df
             t1        t2
    0  01:27:19  01:28:58
    1  01:49:57  01:50:40
    2  03:35:24  03:36:14
    
    # to datetime
    df['t1'] = pd.to_datetime(df['t1'])
    df['t2'] = pd.to_datetime(df['t2'])
    
    # calculate timedeltas and format
    df['diff0'] = (df['t1']-df['t2']).apply(formatTimedelta)
    df['diff1'] = (df['t2']-df['t1']).apply(formatTimedelta)
    
    df['diff0']
    0    -00:01:39
    1    -00:00:43
    2    -00:00:50
    Name: diff0, dtype: object
    
    df['diff1']
    0    00:01:39
    1    00:00:43
    2    00:00:50
    Name: diff1, dtype: object