Check for backward upgrade versions

I have a dataframe like this:

id   info_version
124             2.0.0
124             2.0.0
124             1.0.0
124             1.5.6
124             0.4.5
345             v2alpha1
345             v1alpha1
348             1.0.0-Snapshot
348             1.0.0-Snapshot

I want to compare between theinfo_version and check how many times does the version go backward, like from 2.0.0 to 1.0.0 or from v2 to v1. I am not sure how this will be possible, or if i will have to use the packaging Version class in order to compare.

In my expected output, I would like a count of the number of api_spec_id where such phenomenon is observed. It will be like:

id     count
124                2
345                1
348                0

Any suggestions or ideas on how this could be achieved would be really grateful.

Solution

I would use the packaging library to handle version number comparisons automatically, then a custom groupby.apply:

from packaging.version import Version

out = (df['info_version'].apply(Version)
 .groupby(df['api_spec_id'])
 .apply(lambda s: s.iloc[1:].lt(s.shift().iloc[1:]).sum())
)

Output:

api_spec_id
124    2
345    1
Name: info_version, dtype: int64

Handling invalid version numbers

You can use a custom function and set up a default version number in case of a non valid version:


from packaging.version import Version, InvalidVersion 

def version(s):
    try:
        return Version(s)
    except InvalidVersion:
        return Version('0')

out = (df['info_version'].apply(version)
 .groupby(df['api_spec_id'])
 .apply(lambda s: s.iloc[1:].lt(s.shift().iloc[1:]).sum())
)

Output:

api_spec_id
124    2
345    1
348    0
Name: info_version, dtype: int64

You can also extract the x.y.z part of the string in case of error with a default in case of no match:


import re

def version(s):
    try:
        return Version(s)
    except InvalidVersion:
        m = re.search('(?:\d.)*\d', s)
        return Version(m.group()) if m else Version('0')

Example:

version('1.0.0-Snapshot')
# <Version('1.0.0')>