I have strings of the form Version 1.4.0\n
and Version 1.15.6\n
, and I'd like a simple way of extracting the three numbers from them. I know I can put variables into a string with the format method; I basically want to do that backwards, like this:
# So I know I can do this:
x, y, z = 1, 4, 0
print 'Version {0}.{1}.{2}\n'.format(x,y,z)
# Output is 'Version 1.4.0\n'
# But I'd like to be able to reverse it:
mystr='Version 1.15.6\n'
a, b, c = mystr.unformat('Version {0}.{1}.{2}\n')
# And have the result that a, b, c = 1, 15, 6
Someone else I found asked the same question, but the reply was specific to their particular case: Use Python format string in reverse for parsing
A general answer (how to do format()
in reverse) would be great! An answer for my specific case would be very helpful too though.
Actually the Python regular expression library already provides the general functionality you are asking for. You just have to change the syntax of the pattern slightly
>>> import re
>>> from operator import itemgetter
>>> mystr='Version 1.15.6\n'
>>> m = re.match('Version (?P<_0>.+)\.(?P<_1>.+)\.(?P<_2>.+)', mystr)
>>> map(itemgetter(1), sorted(m.groupdict().items()))
['1', '15', '6']
As you can see, you have to change the (un)format strings from {0} to (?P<_0>.+). You could even require a decimal with (?P<_0>\d+). In addition, you have to escape some of the characters to prevent them from beeing interpreted as regex special characters. But this in turm can be automated again e.g. with
>>> re.sub(r'\\{(\d+)\\}', r'(?P<_\1>.+)', re.escape('Version {0}.{1}.{2}'))
'Version\\ (?P<_0>.+)\\.(?P<_1>.+)\\.(?P<_2>.+)'