Search code examples
pythonstring-formatting

Convert, or unformat, a string to variables (like format(), but in reverse) in Python


I have strings of the form Version 1.4.0\n and Version 1.15.6\n, and I'd like a simple way of extracting the three numbers from them. I know I can put variables into a string with the format method; I basically want to do that backwards, like this:

# So I know I can do this:
x, y, z = 1, 4, 0
print 'Version {0}.{1}.{2}\n'.format(x,y,z)
# Output is 'Version 1.4.0\n'

# But I'd like to be able to reverse it:

mystr='Version 1.15.6\n'
a, b, c = mystr.unformat('Version {0}.{1}.{2}\n')

# And have the result that a, b, c = 1, 15, 6

Someone else I found asked the same question, but the reply was specific to their particular case: Use Python format string in reverse for parsing

A general answer (how to do format() in reverse) would be great! An answer for my specific case would be very helpful too though.


Solution

  • Actually the Python regular expression library already provides the general functionality you are asking for. You just have to change the syntax of the pattern slightly

    >>> import re
    >>> from operator import itemgetter
    >>> mystr='Version 1.15.6\n'
    >>> m = re.match('Version (?P<_0>.+)\.(?P<_1>.+)\.(?P<_2>.+)', mystr)
    >>> map(itemgetter(1), sorted(m.groupdict().items()))
    ['1', '15', '6']
    

    As you can see, you have to change the (un)format strings from {0} to (?P<_0>.+). You could even require a decimal with (?P<_0>\d+). In addition, you have to escape some of the characters to prevent them from beeing interpreted as regex special characters. But this in turm can be automated again e.g. with

    >>> re.sub(r'\\{(\d+)\\}', r'(?P<_\1>.+)', re.escape('Version {0}.{1}.{2}'))
    'Version\\ (?P<_0>.+)\\.(?P<_1>.+)\\.(?P<_2>.+)'