Consider the following string:
AB01CD03
What I want to do is break it down into two tokens namely AB01CD and 03.
In my string the number of digits following the last alpha character is unknown. There is always a sequence of digits at the end of the string.
Now, I can do this:
import re
S = 'AB01CD03'
v, = re.findall(r'(\d+)$', S)
assert v == '03'
...and because I now know the length of v I can deduce how to acquire the preamble using a slice - e.g.,
preamble = S[:-len(v)]
assert preamble == 'AB01CD'
Bearing in mind that the preamble may contain digits, what I'm looking for is a single RE that will reveal the two separate tokens - i.e.,
a, b = re.findall(MAGIC_EXPRESSION, S)
Is this possible?
Yes, like this:
import re
s = 'AB01CD03'
m = re.match(r'^(.+?)(\d+)$', s)
print(m.group(1), m.group(2))
This works because the group (.+?)
is not greedy, so the second group (\d+)
is allowed to match all the digits at the end. ^
and $
ensure the groups sit at the start and end respectively.
Result:
AB01CD 03
Closer to the syntax you were asking for:
a, b = re.match(r'^(.+?)(\d+)$', s).groups()