Search code examples
pythonpandassortingapplynatsort

I am stuck in writing one code. I wanted to sort one column first by integer then combination of integer and string then string


I wanted to sort one column first by integer then combination of integer and string then string.

Below is my data frame.

import pandas as pd

df = pd.DataFrame({'name': ['aaa', 'ccc', 'bbb', '1c2z45zz', 55, 3, 'et45']})

Below is my expected output.


df = pd.DataFrame({'name': [3, 55, '1c2z45zz', 'et45', 'aaa', 'bbb', 'ccc']})

I have tried many method but all gives first combination of string and integer then integer then string.

import natsort

df.iloc[natsort.index_humansorted(df.name)]

then I tried below one.

max_length = max(df.name.str.len())
df['sort_name']=df.name.str.pad(max_length,'right','~')
df.sort_values('sort_name', inplace=True, ignore_index=True)

and the last I tried.

df_sorted = df.astype(str).sort_values(by='A',key=lambda x: (x.isdigit(), x))

Solution

  • You can use numpy.lexsort with your different conditions:

    import natsort
    
    # first sort by pure number
    m1 = ~df['name'].astype(str).str.isdigit()
    # then by strings having a number
    m2 = ~df['name'].astype(str).str.contains(r'\d')
    # optional: sort remaining in human sorted order
    m3 = np.argsort(natsort.index_humansorted(df['name']))
    
    out = df.iloc[np.lexsort([m3, m2, m1])]
    
    print(out)
    

    Output:

           name
    5         3
    4        55
    3  1c2z45zz
    6      et45
    0       aaa
    2       bbb
    1       ccc
    

    Other example:

            name
    6          1
    7          2
    5          3
    8          3
    9          4
    4          5
    10         5
    11         6
    12      1000
    3   1c2z45zz
    14      et45
    0        aaa
    2        bbb
    1        ccc
    13     p_o_j