Search code examples
pythonpandascompareversion

Check for backward upgrade versions


I have a dataframe like this:

id   info_version
124             2.0.0
124             2.0.0
124             1.0.0
124             1.5.6
124             0.4.5
345             v2alpha1
345             v1alpha1
348             1.0.0-Snapshot
348             1.0.0-Snapshot

I want to compare between theinfo_version and check how many times does the version go backward, like from 2.0.0 to 1.0.0 or from v2 to v1. I am not sure how this will be possible, or if i will have to use the packaging Version class in order to compare.

In my expected output, I would like a count of the number of api_spec_id where such phenomenon is observed. It will be like:

id     count
124                2
345                1
348                0

Any suggestions or ideas on how this could be achieved would be really grateful.


Solution

  • I would use the packaging library to handle version number comparisons automatically, then a custom groupby.apply:

    from packaging.version import Version
    
    out = (df['info_version'].apply(Version)
     .groupby(df['api_spec_id'])
     .apply(lambda s: s.iloc[1:].lt(s.shift().iloc[1:]).sum())
    )
    

    Output:

    api_spec_id
    124    2
    345    1
    Name: info_version, dtype: int64
    

    Handling invalid version numbers

    You can use a custom function and set up a default version number in case of a non valid version:

    
    from packaging.version import Version, InvalidVersion 
    
    def version(s):
        try:
            return Version(s)
        except InvalidVersion:
            return Version('0')
    
    out = (df['info_version'].apply(version)
     .groupby(df['api_spec_id'])
     .apply(lambda s: s.iloc[1:].lt(s.shift().iloc[1:]).sum())
    )
    

    Output:

    api_spec_id
    124    2
    345    1
    348    0
    Name: info_version, dtype: int64
    

    You can also extract the x.y.z part of the string in case of error with a default in case of no match:

    
    import re
    
    def version(s):
        try:
            return Version(s)
        except InvalidVersion:
            m = re.search('(?:\d.)*\d', s)
            return Version(m.group()) if m else Version('0')
    

    Example:

    version('1.0.0-Snapshot')
    # <Version('1.0.0')>