I have a list of strings representing numbers. I can't use int because some of the numbers have attached letters, like '33a' or '33b'
['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', '33c', '33d', '33e', '33f', '34', '34', '35a', '35a' ]
My goal is to concatenate the numbers to one string and separate them using a forward slash.
If a number is repeated and its additional letters continue in alphabetical order, the representation should be simplified as follows:
['23a'/'23b'] --> '23a-b'
If a number is repeated without additional letters, it should be listed only once. The same applies to repeating identical pairs of numbers and additional letters.
For the complete example, the desired output looks like this:
'21/23a-b/23k-l/23x/25/33a-f/34/35a'
Using the following code I am able to concatenate the numbers and exclude duplicates, but I fail in trying to simplify the numbers with letters according to the above example.
numbers = ['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', '33c', '33d', '33e', '33f', '34', '34', '35a', '35a' ]
concat_numbers = ""
numbers_set = list(set(numbers))
numbers_set.sort()
for number in numbers_set:
concat_numbers += number + "/"
print(concat_numbers)
>>> '21/23a/23b/23k/23l/23x/25/33a/33b/33c/33d/33e/33f/34/35a/'
Any hints on how to achieve this in the most pythonic way?
This can be done by leveraging defaultdict(list) and recreate your output like so:
data = ['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b',
'33c', '33d', '33e', '33f', '34', '34', '35a', '35a']
data.sort() # easier if letters are sorted - so sort it
from collections import defaultdict
from itertools import takewhile
d = defaultdict(list)
for n in data:
# split in number/letters
number = ''.join(takewhile(str.isdigit, n))
letter = n[len(number):]
# add to dict
d[number].append(letter)
print(d)
We now have a dict with "numbers" as keys and all letters as list and need to clean that up further:
# concat letters that follow each other
def unify(l):
u = [""]
# remember start/end values
first = l[0]
last = l[0]
# iterate the list of letters given
for letter in l:
# for same letters or a->b letters, move last forward
if last == letter or ord(last) == ord(letter)-1:
last = letter
else:
# letter range stopped, add to list
u.append(f"{first}-{last}")
# start over with new values
first = letter
last = letter
# add if not part of the resulting list already
if not u[-1].startswith(first):
# either single letter or range, then add as range
u.append( first if last == first else f"{first}-{last}")
# ignore empty results in u
return ",".join( (w for w in u if w) )
# unify letters
for key,value in d.items():
d[key] = unify(value)
print(d)
and then construct the final output:
r = "/".join(f"{ky}{v}" for ky,vl in d.items() for v in vl.split(","))
print(r)
Output:
# collected splitted key/values
defaultdict(<class 'list'>,
{'21': [''], '23': ['a', 'b', 'k', 'l', 'x'],
'25': [''], '33': ['a', 'b', 'c', 'd', 'e', 'f'],
'34': ['', ''], '35': ['a', 'a']})
# unified values
defaultdict(<class 'list'>,
{'21': '', '23': 'a-b,k-l,x', '25': '',
'33': 'a-f', '34': '', '35': 'a'})
# as string
21/23a-b/23k-l/23x/25/33a-f/34/35a