I wanted to sort one column first by integer then combination of integer and string then string.
Below is my data frame.
import pandas as pd
df = pd.DataFrame({'name': ['aaa', 'ccc', 'bbb', '1c2z45zz', 55, 3, 'et45']})
Below is my expected output.
df = pd.DataFrame({'name': [3, 55, '1c2z45zz', 'et45', 'aaa', 'bbb', 'ccc']})
I have tried many method but all gives first combination of string and integer then integer then string.
import natsort
df.iloc[natsort.index_humansorted(df.name)]
then I tried below one.
max_length = max(df.name.str.len())
df['sort_name']=df.name.str.pad(max_length,'right','~')
df.sort_values('sort_name', inplace=True, ignore_index=True)
and the last I tried.
df_sorted = df.astype(str).sort_values(by='A',key=lambda x: (x.isdigit(), x))
You can use numpy.lexsort
with your different conditions:
import natsort
# first sort by pure number
m1 = ~df['name'].astype(str).str.isdigit()
# then by strings having a number
m2 = ~df['name'].astype(str).str.contains(r'\d')
# optional: sort remaining in human sorted order
m3 = np.argsort(natsort.index_humansorted(df['name']))
out = df.iloc[np.lexsort([m3, m2, m1])]
print(out)
Output:
name
5 3
4 55
3 1c2z45zz
6 et45
0 aaa
2 bbb
1 ccc
Other example:
name
6 1
7 2
5 3
8 3
9 4
4 5
10 5
11 6
12 1000
3 1c2z45zz
14 et45
0 aaa
2 bbb
1 ccc
13 p_o_j