find if a semantic version is superset of of another version python

I have a use case where I have a list of Java maven semantic ranges trying to find the list of versions that I can exclude in my application.

Example:

SemVersion1 = "[,1.8.8.1)"
SemVersion2 = "[,1.8.8.2)"
SemVersion3 = "[,1.8.8.3)"
SemVersion4 = "[1.0.0, 1.4.4)"
SemVersion5 = "[,1.3.11),[,1.7.0), [,1.8.0)"

How to filter the various SemVersion to only keep the one that includes all other versions. In this case, SemVersion3 will be picked up because it includes all the versions.

Thinking about something like this in python:

        output = []
        for a in SemVersions:
            is_redundent = False
            for b in a:
              if affected_versions[a].issuperset(affected_versions[b]):
              is_redundent = True
              break
        if not is_redundent:
          output.append(b)

The problem is that I would need to inflate the SemVersions to get the affected_versions, is there an easier way to do this.

Solution

Steps:

convert your ranges into a more strict notation to eliminate ',1.2.3.4' and '1.0,2.3.4' (we need exact lower bounds for the last step)
convert your strings into actual Version objects for ease of comparisons (you could also do some conversion to numeral tuples but they break on versions containing non-numeric characters)

from packaging.version import Version
from functools import cmp_to_key
from collections import defaultdict

def adj_lengths(s, c=3):
    """Add missing .0 to a version up to a predefined count '.'-count.
    Example:  '9.0.2' with c == 3 -> '9.0.2.0'"""

    cnt = s.count(".")
    if c>cnt:
        s = s + ".0"*(c-cnt)
    return s


def split_mult(s):
    """Make versions more precise by replacing implicit range starts.
    Uses adj_lengths to add missing .0 to equalize version lengths.
    Handles splitting multiple given ranges as well. 
    Returns iterable.
    Example: '[,1.3.11),[,1.7.0), [,1.8.0)' 
             -> (['0.0.0.0', '1.3.11.0'],['0.0.0.0', '1.7.0.0'], ['0.0.0.0', '1.8.0.0'])"""

    s = s.replace("[,",f"[0,")
    s1 = [ adj_lengths(b) for b in (t.strip("[()] ") for t in s.split(","))]

    yield from [ s1[i:i+2] for i in range(0,len(s1),2)]


def make_version(l):
    """Transform text-list into Version-tuple."""

    return ( Version(l[0]), Version(l[1]) )

Program:

vers = ["[,1.8.8.1)",   
        "[,1.8.8.2)",   
        "[,1.8.8.3)",  
        "[1.0.0, 1.4.4)", 
        "[,1.3.11),[,1.7.0), [,1.8.0)"]

# preprocessing to make them nicer
vers2 = []
for tmp in (split_mult(a) for a in vers) :
    vers2.extend( (make_version(k) for k in tmp) )
print(vers2)

# bring into order
vers2.sort()

# use the lower bound as key - append alle upper bounds to a defaultdict(list)  
d = defaultdict(list)
for fr,to in vers2:
    d[fr].append(to)

# simplify lower bound:[list of upper bounds] to (lower bound, max(upper bound list values))
vers3 = [ (k,max(v)) for k,v in d.items()]

# eliminate range that lie inside the 1st elements range
for item in vers3[1:][:]:
    if item[0] >= vers3[0][0] and item[1] <= vers3[0][1]:
        vers3.remove(item)

print(vers3)

Output:

[(<Version('0.0.0.0')>, <Version('1.8.8.3')>)]

If you have more then one resulting range you would have to do the last step for every element - not just the first one, f.e. when your data in the last step is like:

[(<Version('0.0.0.0')>, <Version('1.8.8.3')>), 
 (<Version('1.0.0.0')>, <Version('2.8.8.3')>), 
 (<Version('1.2.0.0')>, <Version('1.8.8.3')>), ] # last elem inside before-last elem