How would you sort the following dataframe:
df = pd.DataFrame({'a':['abc_1.2.6','abc_1.2.60','abc_1.2.7','abc_1.2.9','abc_1.3.0','abc_1.3.10','abc_1.3.100','abc_1.3.11'], 'b':[1,2,3,4,5,6,7,8]})
>>>
a b
0 abc_1.2.6 1
1 abc_1.2.60 2
2 abc_1.2.7 3
3 abc_1.2.9 4
4 abc_1.3.0 5
5 abc_1.3.10 6
6 abc_1.3.100 7
7 abc_1.3.11 8
to achieve this output?
>>>
a b
0 abc_1.2.6 1
1 abc_1.2.7 3
2 abc_1.2.9 4
3 abc_1.2.60 2
4 abc_1.3.0 5
5 abc_1.3.10 6
6 abc_1.3.11 8
7 abc_1.3.100 7
I understand that integers in strings can be accessed through string transformations, however I'm unsure how to handle this in a dataframe. Obviously df.sort_values(by=['a'],ignore_index=True)
is unhelpful in this case.
One way to use is natsorted
with iloc
:
#pip install natsort
from natsort import natsorted
out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, "a"])]
Or even shorter, as suggested by @Stef, use natsort_key
as a key of sort_values
:
from natsort import natsort_key
out = df.sort_values(by="a", key=natsort_key, ignore_index=True)
Output :
print(out)
a b
0 abc_1.2.6 1
1 abc_1.2.7 3
2 abc_1.2.9 4
3 abc_1.2.60 2
4 abc_1.3.0 5
5 abc_1.3.10 6
6 abc_1.3.11 8
7 abc_1.3.100 7