I am a beginner in Python. I am trying to sort a dataframe on versions
column -
print(df)
versions memory
0 10.0.0 107.07
1 10.0.0.1 110.11
2 10.0.2 110.56
3 10.0.4 116.27
4 10.1.0 116.17
5 10.1.2 113.98
6 10.1.4 113.27
7 10.2.0 103.32
8 9.9.2 112.02
9 9.9.2.3 112.28
10 9.9.4 114.45
11 9.9.4.1 114.32
such that the resulting dataframe is -
versions memory
0 9.9.2 112.02
1 9.9.2.3 112.28
2 9.9.4 114.45
3 9.9.4.1 114.32
4 10.0.0 107.07
5 10.0.0.1 110.11
6 10.0.2 110.56
7 10.0.4 116.27
8 10.1.0 116.17
9 10.1.2 113.98
10 10.1.4 113.27
11 10.2.0 103.32
The datatype of the versions
columns is object
.
I tried doing -
df = df.sort_values('versions')
But that leaves the dataframe unchanged.
Also tried doing -
from distutils.version import StrictVersion
df = sorted(df['ghost_version'], key=StrictVersion)
but it errors out on ValueError: invalid version number '10.0.0.1'
Can someone please guide?
Thanks
You can create tuples filled by integers, get indices of sorted values by Series.argsort
and change order by DataFrame.iloc
, last for defaut index use DataFrame.reset_index
:
df1 = (df.iloc[df['versions'].apply(lambda x: tuple(map(int, x.split(".")))).argsort()]
.reset_index(drop=True))
Or use helper column with DataFrame.sort_values
and last remove column:
df['tmp'] = df['versions'].apply(lambda x: tuple(map(int, x.split("."))))
df1 = df.sort_values('tmp').drop('tmp', axis=1).reset_index(drop=True)
Also for order of sorted values is possible use LooseVersion
:
from distutils.version import LooseVersion
df1 = df.iloc[df['versions'].apply(LooseVersion).argsort()].reset_index(drop=True)
versions memory
0 9.9.2 112.02
1 9.9.2.3 112.28
2 9.9.4 114.45
3 9.9.4.1 114.32
4 10.0.0 107.07
5 10.0.0.1 110.11
6 10.0.2 110.56
7 10.0.4 116.27
8 10.1.0 116.17
9 10.1.2 113.98
10 10.1.4 113.27
11 10.2.0 103.32