Search code examples
pythonpython-re

Concatenate and simplify a list containing number and letter pairs


I have a list of strings representing numbers. I can't use int because some of the numbers have attached letters, like '33a' or '33b'

['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', '33c', '33d', '33e', '33f', '34', '34', '35a', '35a' ]

My goal is to concatenate the numbers to one string and separate them using a forward slash.

If a number is repeated and its additional letters continue in alphabetical order, the representation should be simplified as follows:

['23a'/'23b'] --> '23a-b'

If a number is repeated without additional letters, it should be listed only once. The same applies to repeating identical pairs of numbers and additional letters.

For the complete example, the desired output looks like this:

'21/23a-b/23k-l/23x/25/33a-f/34/35a'

Using the following code I am able to concatenate the numbers and exclude duplicates, but I fail in trying to simplify the numbers with letters according to the above example.

numbers = ['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', '33c', '33d', '33e', '33f', '34', '34', '35a', '35a' ]

concat_numbers = ""
numbers_set = list(set(numbers))

numbers_set.sort()
for number in numbers_set: 
    concat_numbers += number + "/"
    
print(concat_numbers)
>>> '21/23a/23b/23k/23l/23x/25/33a/33b/33c/33d/33e/33f/34/35a/'

Any hints on how to achieve this in the most pythonic way?


Solution

  • This can be done by leveraging defaultdict(list) and recreate your output like so:

    data = ['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', 
            '33c', '33d', '33e', '33f', '34', '34', '35a', '35a']
    data.sort() # easier if letters are sorted - so sort it
    
    from collections import defaultdict
    from itertools import takewhile
    d = defaultdict(list)
    for n in data:
        # split in number/letters
        number = ''.join(takewhile(str.isdigit, n))
        letter = n[len(number):]
        # add to dict
        d[number].append(letter)
    
    print(d)
    

    We now have a dict with "numbers" as keys and all letters as list and need to clean that up further:

    # concat letters that follow each other
    def unify(l):
        u = [""]
        # remember start/end values
        first = l[0]
        last = l[0]
        # iterate the list of letters given
        for letter in l:
            # for same letters or a->b letters, move last forward
            if last == letter or ord(last) == ord(letter)-1:
                last = letter
            else:
                # letter range stopped, add to list
                u.append(f"{first}-{last}")
                # start over with new values
                first = letter
                last = letter
        # add if not part of the resulting list already
        if  not u[-1].startswith(first):
            # either single letter or range, then add as range
            u.append( first if last == first else f"{first}-{last}")
    
        # ignore empty results in u
        return ",".join( (w for w in u if w) )
    
    # unify letters
    for key,value in d.items():
        d[key] = unify(value)
    
    print(d)
    

    and then construct the final output:

    r = "/".join(f"{ky}{v}" for ky,vl in d.items() for v in vl.split(","))
    print(r)
    

    Output:

    # collected splitted key/values
    defaultdict(<class 'list'>, 
    {'21': [''], '23': ['a', 'b', 'k', 'l', 'x'], 
     '25': [''], '33': ['a', 'b', 'c', 'd', 'e', 'f'], 
     '34': ['', ''], '35': ['a', 'a']})
    
    # unified values
    defaultdict(<class 'list'>, 
    {'21': '', '23': 'a-b,k-l,x', '25': '', 
     '33': 'a-f', '34': '', '35': 'a'})
    
    # as string
    21/23a-b/23k-l/23x/25/33a-f/34/35a