Search code examples
pythonlistpython-itertools

maximum value of a unique string in a list


Hi I'm trying to make a list of the maximum value of a unique string within a list.

example:

a = ['DS10.json', 'DS11.json', 'DT4.json', 'DT5.json', 'DT6.json', 'CJ6.json', 'CJ7.json']

should return me a list of the following:

['DS11.json', 'DT6.json', 'CJ7.json']

I have tried the following code:

def j(l):
    p = []
    for i in l:
        digcode = i.split('.')[0]
        if any(s.startswith(digcode[:2]) for s in p): #there exists prefex in list
            if digcode[2:] > p[[n for n, l in enumerate(p) if l.startswith(digcode[:2])][0]][2:]:
                p.pop([n for n, l in enumerate(p) if l.startswith(digcode[:2])][0])
                p.append(digcode)
            else:
                pass
        else:
            p.append(digcode)
    return p

But when I apply it to a larger sample it does not do an accurate job

>>> o = ['AS6.json', 'AS7.json', 'AS8.json', 'AS9.json', 'BS1.json', 'BS2.json', 'BS3.json', 'BS4.json', 'BS5.json', 'CS1.json', 'CS2.json', 'CS3.json', 'CS4.json', 'CS5.json', 'CS6.json', 'DS10.json', 'DS11.json', 'DS4.json', 'DS5.json', 'DS6.json', 'DS7.json', 'DS8.json', 'DS9.json', 'ES4.json', 'ES5.json', 'ES6.json', 'FS5.json', 'FS6.json', 'FS7.json', 'FS8.json', 'MS4.json', 'MS5.json', 'MS6.json', 'MS7.json', 'MS8.json', 'MS9.json', 'NR1.json', 'NR2.json', 'NR3.json', 'NR4.json', 'NR5.json', 'NR6.json', 'NR7.json', 'NR8.json', 'VR1.json', 'VR2.json', 'VR3.json', 'VR4.json', 'VR5.json', 'VR6.json', 'VR7.json', 'VR8.json', 'XS11.json', 'XS9.json']

>>> j(o)
['AS9', 'BS5', 'CS6', 'DS9', 'ES6', 'FS8', 'MS9', 'NR8', 'VR8', 'XS9']

which is incorrect as there is a XS11 and DS11 as an example.

I would appreciate if someone could help me rectify my problem or perhaps find a simpler solution to my problem. Thank you


Solution

  • You are making string comparisons; '9' is greater than '11' because the character '9' comes later in the alphabet. You'll have to convert those to integers first.

    I'd use a dictionary to map prefixes to the maximum number:

    def find_latest(lst):
        prefixes = {}
        for entry in lst:
            code, value = entry[:2], int(entry.partition('.')[0][2:])
            if value > prefixes.get(code, (float('-inf'), ''))[0]:
                prefixes[code] = (value, entry)
        return [entry for value, entry in prefixes.values()]
    

    This is far more efficient as it doesn't loop over your whole input list each time; you are processing the list N^2 times (add one element and you are adding N tests to work through); it processes your list in N steps instead. So instead of 100 tests for 10 elements, this just executes 10 tests.

    Demo:

    >>> sample = ['AS6.json', 'AS7.json', 'AS8.json', 'AS9.json', 'BS1.json', 'BS2.json', 'BS3.json', 'BS4.json', 'BS5.json', 'CS1.json', 'CS2.json', 'CS3.json', 'CS4.json', 'CS5.json', 'CS6.json', 'DS10.json', 'DS11.json', 'DS4.json', 'DS5.json', 'DS6.json', 'DS7.json', 'DS8.json', 'DS9.json', 'ES4.json', 'ES5.json', 'ES6.json', 'FS5.json', 'FS6.json', 'FS7.json', 'FS8.json', 'MS4.json', 'MS5.json', 'MS6.json', 'MS7.json', 'MS8.json', 'MS9.json', 'NR1.json', 'NR2.json', 'NR3.json', 'NR4.json', 'NR5.json', 'NR6.json', 'NR7.json', 'NR8.json', 'VR1.json', 'VR2.json', 'VR3.json', 'VR4.json', 'VR5.json', 'VR6.json', 'VR7.json', 'VR8.json', 'XS11.json', 'XS9.json']
    >>> def find_latest(lst):
    ...     prefixes = {}
    ...     for entry in lst:
    ...         code, value = entry[:2], int(entry.partition('.')[0][2:])
    ...         if value > prefixes.get(code, (float('-inf'), ''))[0]:
    ...             prefixes[code] = (value, entry)
    ...     return [entry for value, entry in prefixes.values()]
    ... 
    >>> find_latest(sample)
    ['FS8.json', 'VR8.json', 'AS9.json', 'MS9.json', 'BS5.json', 'CS6.json', 'XS11.json', 'NR8.json', 'DS11.json', 'ES6.json']