I want to use the function total_seconds.
I obtain the difference between two dates by subtracting the beginning from the end.
df["diff"] = (df["End"] - df["Start"])
which yields:
0 0 days 00:12:08
1 0 days 00:18:56
2 0 days 00:17:17
3 0 days 00:48:46
4 0 days 00:21:02
...
7015 0 days 00:14:32
7016 0 days 00:08:33
7017 0 days 00:19:38
7018 0 days 00:18:41
7019 0 days 00:37:35
Name: diff, Length: 7020, dtype: timedelta64[ns]
There is a function total seconds. But it doesn't work the df["diff"]
that I created. Is timedelta64[ns]
something different?
The function total_seconds()
works if I call pd.Timedelta()
on an individual element of the df["diff"]
and than call total_seconds()
.
I would like some clarification on dtype here and how to use the total_seconds function on the whole series.
You can use Timedelta.total_seconds
method to access the total seconds of a single instance of Timedelta
like:
>>> df['diff'].iloc[0].total_seconds()
728.0
But if you want to access the total seconds of a list (a Series
) of Timedelta
instances, you have to use the accessor dt
because the list is a TimedeltaIndex
(a collection of Timedelta
instances):
>>> df['diff'].dt.total_seconds()
0 728.0
1 1136.0
2 1037.0
3 2926.0
4 1262.0
7015 872.0
7016 513.0
7017 1178.0
7018 1121.0
7019 2255.0
Name: diff, dtype: float64
Suppose your example:
data = {'diff': ['0 days 00:12:08', '0 days 00:18:56', '0 days 00:17:17']}
df = pd.DataFrame(data)
You can convert each value:
>>> df['diff'].apply(pd.Timedelta)
0 0 days 00:12:08
1 0 days 00:18:56
2 0 days 00:17:17
Name: diff, dtype: timedelta64[ns]
# OR
>>> [pd.Timedelta(x) for x in df['diff']]
[Timedelta('0 days 00:12:08'),
Timedelta('0 days 00:18:56'),
Timedelta('0 days 00:17:17')]
Or you can convert the whole list:
>>> pd.to_timedelta(df['diff'])
0 0 days 00:12:08
1 0 days 00:18:56
2 0 days 00:17:17
Name: diff, dtype: timedelta64[ns]
# OR
>>> pd.TimedeltaIndex(df['diff'])
TimedeltaIndex(['0 days 00:12:08', '0 days 00:18:56', '0 days 00:17:17'],
dtype='timedelta64[ns]', name='diff', freq=None)