I'm trying to find the median of a list of timeDelta objects generated from a PANDAS dataframe. I've tried using the statistics library as such:
newList= list(DF.sort_values(['TimeDelta'])['TimeDelta'])
TDmedian = (st.median(newList))
st is what I imported the statistics library as.
But I get the error:
`TypeError: unsupported operand type(s) for /: 'str' and 'int'`
I tried to make a function to calculate it: `
def date_median(date_list):
length = len(date_list)
print(length)
//Checks if the length is odd cause median in odd numbered lists is the middle value
if length % 2 != 0:
return date_list[length//2]
else:
//If it's even, it'll take the middle value and the one above it and generate the mean
print((length//2), (length//2+1))
lower = date_list[length//2]
upper = date_list[(length//2) +1]
return (lower + upper)/2`
And I use it like this:
`TAmedian = date_median(newList)`
And I get this error:
`TypeError: unsupported operand type(s) for /: 'str' and 'int'`
Is there an easier way to do this and if not then what I am doing wrong?
Sample Data:
DF['TimeDelta'] = [0 days 00:00:36.35700000,0 days 00:47:11.213000000]
Why convert to a list
? pandas.DataFrame
has in store everything you need:
import pandas as pd
DF = pd.DataFrame({'TimeDelta': pd.to_timedelta(['0 days 00:00:36.35700000',
'0 days 00:47:11.213000000'])})
DF['TimeDelta'].mean()
# Timedelta('0 days 00:23:53.785000')
DF['TimeDelta'].median()
# Timedelta('0 days 00:23:53.785000')
Of course, if you don't have a df in the first place, you could also do without, like e.g.
pd.to_timedelta(['0 days 00:00:36.35700000', '0 days 00:47:11.213000000']).median()