Search code examples
pythonpandasmeanpython-datetime

computing the mean for python datetime


I have a datetime attribute:

d = {
    'DOB': pd.Series([
        datetime.datetime(2014, 7, 9),
        datetime.datetime(2014, 7, 15),
        np.datetime64('NaT')
    ], index=['a', 'b', 'c'])
}
df_test = pd.DataFrame(d)

I would like to compute the mean for that attribute. Running mean() causes an error:

TypeError: reduction operation 'mean' not allowed for this dtype

I also tried the solution proposed elsewhere. It doesn't work as running the function proposed there causes

OverflowError: Python int too large to convert to C long

What would you propose? The result for the above dataframe should be equivalent to

datetime.datetime(2014, 7, 12).

Solution

  • You can take the mean of Timedelta. So find the minimum value and subtract it from the series to get a series of Timedelta. Then take the mean and add it back to the minimum.

    dob = df_test.DOB
    m = dob.min()
    (m + (dob - m).mean()).to_pydatetime()
    
    datetime.datetime(2014, 7, 12, 0, 0)
    

    One-line

    df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(d.min())).to_pydatetime()
    

    To @ALollz point

    I use the epoch pd.Timestamp(0) instead of min

    df_test.DOB.pipe(lambda d: (lambda m: m + (d - m).mean())(pd.Timestamp(0))).to_pydatetime()