Search code examples
pythonpandasdataframesortingcase-insensitive

Sort dataframe by multiple columns text and numeric while ignoring case


How can I sort a Pandas dataframe with text and numeric columns while ignoring the case?

df = pd.DataFrame({
        'A':list('aabbCC'),
        'B':[2,1,2,1,10,1]
})

Based on this answer Sort dataframe by multiple columns while ignoring case

df.sort_values(by=[ 'A', 'B'], inplace=True, key=lambda x: x.str.lower())

I get an error

builtins.AttributeError: Can only use .str accessor with string values!

How do I have to modify the key function?


Solution

  • Use if-else statement with lowercase for non numeric column:

    f = lambda x: x if np.issubdtype(x.dtype, np.number) else x.str.lower()
    df.sort_values(by=[ 'A', 'B'], inplace=True, key=f)
    

    Or:

    df = df.loc[df.assign(A=df['A'].str.lower()).sort_values(by=[ 'A', 'B']).index]