Search code examples
pythonpandasdataframepandas-groupbymean

Pandas Dataframe Timedelta mean() not showing correct values


I've searched and searched and haven't found a solution or just an answer to why this is happening.

I'm trying to determine the mean() per Borough for Request_Closing_Time in my pandas dataframe. When I groupby Borough and then get the mean for Request Closing Time I'm getting very large values, 13 to 44 days and then even 87,426 days for one result. The max value is only 24 days and the average for the whole dataframe is only 4 hours. If I single out an individual Borough with the get_group command then I seem to get a more normal answer which I believe is correct.

Any ideas as to why values for mean are so far off when groupby is by Borough only?

In [283]:nyc_sr_calls_trimmed.describe()
Out[283]:        Unique Key     Request_Closing_Time       Incident Zip
         count  3.006980e+05    298534  298083.000000
         mean   3.130054e+07    0 days 04:18:51.832782865   10848.888645
         std    5.738547e+05    0 days 06:05:22.141833856   583.182081
         min    3.027948e+07    0 days 00:01:00 83.000000
         25%    3.080118e+07    0 days 01:16:33 10310.000000
         50%    3.130436e+07    0 days 02:42:55.500000  11208.000000
         75%    3.178446e+07    0 days 05:21:00 11238.000000
         max    3.231065e+07    24 days 16:52:22    11697.000000

In [284]: nyc_sr_calls_trimmed['Request_Closing_Time'].mean()
Out[284]:Timedelta('0 days 04:18:51.832782865')

In [285]:by_burrough = nyc_sr_calls_trimmed.groupby(['Borough'])
         by_burrough['Request_Closing_Time'].mean(numeric_only=None).dt.floor('s')

Out[285]:Borough
         BRONX              -13 days +03:21:47
         BROOKLYN           -13 days +03:18:24
         MANHATTAN          -36 days +14:38:51
         QUEENS             -16 days +07:37:05
         STATEN ISLAND      -44 days +22:01:24
         Unspecified     -87426 days +14:15:03
         Name: Request_Closing_Time, dtype: timedelta64[ns]

In [286]:b = by_burrough.get_group('QUEENS')
         b['Request_Closing_Time'].mean(numeric_only=False)

Out[286]:Timedelta('0 days 04:52:16.450111002')

Any help would be appreciated. I'm not sure what i'm not thinking of or accounting for. Thanks.


Solution

  • "If I single out an individual Borough with the get_group command then I seem to get a more normal answer which I believe is correct".

    Since you mentioned the above, something I would try is to get the mean of Request_Closing_Time for each Borough individually, as follows:

    by_burrough["Request_Closing_Time"].apply(lambda df_group: df_group.mean(numeric_only=False))
    

    or

    by_burrough.apply(lambda df_group: df_group["Request_Closing_Time"].mean(numeric_only=False))