Search code examples
pythonpandasdatetimetimedelta

The difference between pandas Timedelta and timedelta64[ns]?


I want to use the function total_seconds.

I obtain the difference between two dates by subtracting the beginning from the end.

df["diff"] = (df["End"] - df["Start"])

which yields:

0      0 days 00:12:08
1      0 days 00:18:56
2      0 days 00:17:17
3      0 days 00:48:46
4      0 days 00:21:02
             ...      
7015   0 days 00:14:32
7016   0 days 00:08:33
7017   0 days 00:19:38
7018   0 days 00:18:41
7019   0 days 00:37:35
Name: diff, Length: 7020, dtype: timedelta64[ns]

There is a function total seconds. But it doesn't work the df["diff"] that I created. Is timedelta64[ns] something different?

The function total_seconds() works if I call pd.Timedelta() on an individual element of the df["diff"] and than call total_seconds().

I would like some clarification on dtype here and how to use the total_seconds function on the whole series.


Solution

  • You can use Timedelta.total_seconds method to access the total seconds of a single instance of Timedelta like:

    >>> df['diff'].iloc[0].total_seconds()
    728.0
    

    But if you want to access the total seconds of a list (a Series) of Timedelta instances, you have to use the accessor dt because the list is a TimedeltaIndex (a collection of Timedelta instances):

    >>> df['diff'].dt.total_seconds()
    0        728.0
    1       1136.0
    2       1037.0
    3       2926.0
    4       1262.0
    7015     872.0
    7016     513.0
    7017    1178.0
    7018    1121.0
    7019    2255.0
    Name: diff, dtype: float64
    

    Suppose your example:

    data = {'diff': ['0 days 00:12:08', '0 days 00:18:56', '0 days 00:17:17']}
    df = pd.DataFrame(data)
    

    You can convert each value:

    >>> df['diff'].apply(pd.Timedelta)
    0   0 days 00:12:08
    1   0 days 00:18:56
    2   0 days 00:17:17
    Name: diff, dtype: timedelta64[ns]
    
    # OR
    
    >>> [pd.Timedelta(x) for x in df['diff']]
    [Timedelta('0 days 00:12:08'),
     Timedelta('0 days 00:18:56'),
     Timedelta('0 days 00:17:17')]
    

    Or you can convert the whole list:

    >>> pd.to_timedelta(df['diff'])
    0   0 days 00:12:08
    1   0 days 00:18:56
    2   0 days 00:17:17
    Name: diff, dtype: timedelta64[ns]
    
    # OR
    
    >>> pd.TimedeltaIndex(df['diff'])
    TimedeltaIndex(['0 days 00:12:08', '0 days 00:18:56', '0 days 00:17:17'],
                   dtype='timedelta64[ns]', name='diff', freq=None)