I have a dataframe like this:
id info_version
124 2.0.0
124 2.0.0
124 1.0.0
124 1.5.6
124 0.4.5
345 v2alpha1
345 v1alpha1
348 1.0.0-Snapshot
348 1.0.0-Snapshot
I want to compare between theinfo_version
and check how many times does the version go backward, like from 2.0.0 to 1.0.0 or from v2 to v1. I am not sure how this will be possible, or if i will have to use the packaging Version class
in order to compare.
In my expected output, I would like a count of the number of api_spec_id
where such phenomenon is observed. It will be like:
id count
124 2
345 1
348 0
Any suggestions or ideas on how this could be achieved would be really grateful.
I would use the packaging
library to handle version number comparisons automatically, then a custom groupby.apply
:
from packaging.version import Version
out = (df['info_version'].apply(Version)
.groupby(df['api_spec_id'])
.apply(lambda s: s.iloc[1:].lt(s.shift().iloc[1:]).sum())
)
Output:
api_spec_id
124 2
345 1
Name: info_version, dtype: int64
You can use a custom function and set up a default version number in case of a non valid version:
from packaging.version import Version, InvalidVersion
def version(s):
try:
return Version(s)
except InvalidVersion:
return Version('0')
out = (df['info_version'].apply(version)
.groupby(df['api_spec_id'])
.apply(lambda s: s.iloc[1:].lt(s.shift().iloc[1:]).sum())
)
Output:
api_spec_id
124 2
345 1
348 0
Name: info_version, dtype: int64
You can also extract the x.y.z
part of the string in case of error with a default in case of no match:
import re
def version(s):
try:
return Version(s)
except InvalidVersion:
m = re.search('(?:\d.)*\d', s)
return Version(m.group()) if m else Version('0')
Example:
version('1.0.0-Snapshot')
# <Version('1.0.0')>