Search code examples
pythonpandasdataframemediantimedelta

calculating the median (or mean) of a timedelta list


I'm trying to find the median of a list of timeDelta objects generated from a PANDAS dataframe. I've tried using the statistics library as such:

newList= list(DF.sort_values(['TimeDelta'])['TimeDelta'])
TDmedian = (st.median(newList))

st is what I imported the statistics library as.

But I get the error:

`TypeError: unsupported operand type(s) for /: 'str' and 'int'`

I tried to make a function to calculate it: `

def date_median(date_list):
    length = len(date_list)
    print(length)
//Checks if the length is odd cause median in odd numbered lists is the middle value
    if length % 2 != 0:
        return date_list[length//2]
    else:
//If it's even, it'll take the middle value and the one above it and generate the mean
        print((length//2), (length//2+1))
        lower = date_list[length//2]
        upper = date_list[(length//2) +1]
        return (lower + upper)/2`

And I use it like this:

`TAmedian = date_median(newList)`

And I get this error:

`TypeError: unsupported operand type(s) for /: 'str' and 'int'`

Is there an easier way to do this and if not then what I am doing wrong?

Sample Data:

DF['TimeDelta'] = [0 days 00:00:36.35700000,0 days 00:47:11.213000000]

Solution

  • Why convert to a list? pandas.DataFrame has in store everything you need:

    import pandas as pd
    
    DF = pd.DataFrame({'TimeDelta': pd.to_timedelta(['0 days 00:00:36.35700000', 
                                                     '0 days 00:47:11.213000000'])})
    
    DF['TimeDelta'].mean()
    # Timedelta('0 days 00:23:53.785000')
    DF['TimeDelta'].median()
    # Timedelta('0 days 00:23:53.785000')
    

    Of course, if you don't have a df in the first place, you could also do without, like e.g.

    pd.to_timedelta(['0 days 00:00:36.35700000', '0 days 00:47:11.213000000']).median()