Search code examples
pythonpandasstringsortinginteger

How do you sort a dataframe with the integer in a column with strings and integers on every row?


How would you sort the following dataframe:

df = pd.DataFrame({'a':['abc_1.2.6','abc_1.2.60','abc_1.2.7','abc_1.2.9','abc_1.3.0','abc_1.3.10','abc_1.3.100','abc_1.3.11'], 'b':[1,2,3,4,5,6,7,8]})

>>>
    a           b
0   abc_1.2.6   1
1   abc_1.2.60  2
2   abc_1.2.7   3
3   abc_1.2.9   4
4   abc_1.3.0   5
5   abc_1.3.10  6
6   abc_1.3.100 7
7   abc_1.3.11  8

to achieve this output?

>>>
    a           b
0   abc_1.2.6   1
1   abc_1.2.7   3
2   abc_1.2.9   4
3   abc_1.2.60  2
4   abc_1.3.0   5
5   abc_1.3.10  6
6   abc_1.3.11  8
7   abc_1.3.100 7

I understand that integers in strings can be accessed through string transformations, however I'm unsure how to handle this in a dataframe. Obviously df.sort_values(by=['a'],ignore_index=True) is unhelpful in this case.


Solution

  • One way to use is natsorted with iloc :

    #pip install natsort
    from natsort import natsorted
    ​
    out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, "a"])]
    

    Or even shorter, as suggested by @Stef, use natsort_key as a key of sort_values :

    from natsort import natsort_key
    
    out = df.sort_values(by="a", key=natsort_key, ignore_index=True)
    

    Output :

    print(out)
                 a  b
    0    abc_1.2.6  1
    1    abc_1.2.7  3
    2    abc_1.2.9  4
    3   abc_1.2.60  2
    4    abc_1.3.0  5
    5   abc_1.3.10  6
    6   abc_1.3.11  8
    7  abc_1.3.100  7