I have a datetime attribute:
d = {
'DOB': pd.Series([
datetime.datetime(2014, 7, 9),
datetime.datetime(2014, 7, 15),
np.datetime64('NaT')
], index=['a', 'b', 'c'])
}
df_test = pd.DataFrame(d)
I would like to compute the mean for that attribute. Running mean() causes an error:
TypeError: reduction operation 'mean' not allowed for this dtype
I also tried the solution proposed elsewhere. It doesn't work as running the function proposed there causes
OverflowError: Python int too large to convert to C long
What would you propose? The result for the above dataframe should be equivalent to
datetime.datetime(2014, 7, 12).
You can take the mean of Timedelta
. So find the minimum value and subtract it from the series to get a series of Timedelta
. Then take the mean and add it back to the minimum.
dob = df_test.DOB
m = dob.min()
(m + (dob - m).mean()).to_pydatetime()
datetime.datetime(2014, 7, 12, 0, 0)
One-line
df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(d.min())).to_pydatetime()
I use the epoch pd.Timestamp(0)
instead of min
df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(pd.Timestamp(0))).to_pydatetime()