Search code examples
pythonpandastypeerror

median() tries to change column to numeric


I am using median() inside of an if-else list comprehension as such:

summary_frame = pd.DataFrame({
# ...
"50%": [df[col].median() if "float" or "int" or "time" in str(df[col].dtype) else df[col].mode() for col in df.columns]
# ...
})

where if the datatype is not numerical, else it should default to mode(), which works when the datatype is not numerical.

however, it tries to convert the series to numerical and throws a TypeError. Is there a way around this?


Solution

  • Change

    if "float" or "int" or "time" or in str(df[col].dtype)
    

    to

    if any(t in str(df[col].dtype) for t in ("float", "int", "time"))
    

    or in Python is not like English, it doesn't automatically distribute over the comparison operation. You can't write

    if x or y or z in something
    

    That's parsed as

    if x or y or (z in something)
    

    Full test code:

    >>> df = pd.DataFrame({'a': [1, 2, 3, 2, 3, 6, 10], 'b': ['a', 'x', 'x', 'foo', 'b', 'x', 'y'] })
    >>> summary_frame = pd.DataFrame({'50%': [df[col].median() if any(t in str(df[col].dtype) for t in ("float", "int", "time")) else df[col].mode() for col in df.columns]})
    >>> summary_frame
                                 50%
    0                            3.0
    1  0    x
    Name: b, dtype: object