Concatenate and simplify a list containing number and letter pairs

I have a list of strings representing numbers. I can't use int because some of the numbers have attached letters, like '33a' or '33b'

['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', '33c', '33d', '33e', '33f', '34', '34', '35a', '35a' ]

My goal is to concatenate the numbers to one string and separate them using a forward slash.

If a number is repeated and its additional letters continue in alphabetical order, the representation should be simplified as follows:

['23a'/'23b'] --> '23a-b'

If a number is repeated without additional letters, it should be listed only once. The same applies to repeating identical pairs of numbers and additional letters.

For the complete example, the desired output looks like this:

'21/23a-b/23k-l/23x/25/33a-f/34/35a'

Using the following code I am able to concatenate the numbers and exclude duplicates, but I fail in trying to simplify the numbers with letters according to the above example.

numbers = ['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', '33c', '33d', '33e', '33f', '34', '34', '35a', '35a' ]

concat_numbers = ""
numbers_set = list(set(numbers))

numbers_set.sort()
for number in numbers_set: 
    concat_numbers += number + "/"
    
print(concat_numbers)
>>> '21/23a/23b/23k/23l/23x/25/33a/33b/33c/33d/33e/33f/34/35a/'

Any hints on how to achieve this in the most pythonic way?

Solution

This can be done by leveraging defaultdict(list) and recreate your output like so:

data = ['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', 
        '33c', '33d', '33e', '33f', '34', '34', '35a', '35a']
data.sort() # easier if letters are sorted - so sort it

from collections import defaultdict
from itertools import takewhile
d = defaultdict(list)
for n in data:
    # split in number/letters
    number = ''.join(takewhile(str.isdigit, n))
    letter = n[len(number):]
    # add to dict
    d[number].append(letter)

print(d)

We now have a dict with "numbers" as keys and all letters as list and need to clean that up further:

# concat letters that follow each other
def unify(l):
    u = [""]
    # remember start/end values
    first = l[0]
    last = l[0]
    # iterate the list of letters given
    for letter in l:
        # for same letters or a->b letters, move last forward
        if last == letter or ord(last) == ord(letter)-1:
            last = letter
        else:
            # letter range stopped, add to list
            u.append(f"{first}-{last}")
            # start over with new values
            first = letter
            last = letter
    # add if not part of the resulting list already
    if  not u[-1].startswith(first):
        # either single letter or range, then add as range
        u.append( first if last == first else f"{first}-{last}")

    # ignore empty results in u
    return ",".join( (w for w in u if w) )

# unify letters
for key,value in d.items():
    d[key] = unify(value)

print(d)

and then construct the final output:

r = "/".join(f"{ky}{v}" for ky,vl in d.items() for v in vl.split(","))
print(r)

Output:

# collected splitted key/values
defaultdict(<class 'list'>, 
{'21': [''], '23': ['a', 'b', 'k', 'l', 'x'], 
 '25': [''], '33': ['a', 'b', 'c', 'd', 'e', 'f'], 
 '34': ['', ''], '35': ['a', 'a']})

# unified values
defaultdict(<class 'list'>, 
{'21': '', '23': 'a-b,k-l,x', '25': '', 
 '33': 'a-f', '34': '', '35': 'a'})

# as string
21/23a-b/23k-l/23x/25/33a-f/34/35a