I am familiar with the ability to insert variables into a string using Templates, like this:
Template('value is between $min and $max').substitute(min=5, max=10)
What I now want to know is if it is possible to do the reverse. I want to take a string, and extract the values from it using a template, so that I have some data structure (preferably just named variables, but a dict is fine) that contains the extracted values. For example:
>>> string = 'value is between 5 and 10'
>>> d = Backwards_template('value is between $min and $max').extract(string)
>>> print d
{'min': '5', 'max':'10'}
Is this possible?
That's called regular expressions:
import re
string = 'value is between 5 and 10'
m = re.match(r'value is between (.*) and (.*)', string)
print(m.group(1), m.group(2))
Output:
5 10
Update 1. Names can be given to groups:
m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
print(m.group('min'), m.group('max'))
But this feature is not used often, as there are usually enough problems with a more important aspect: how to capture exactly what you want (with this particular case that's not a big deal, but even here: what if the string is value is between 1 and 2 and 3
-- should the string be accepted and what's the min
and max
?).
Update 2. Rather than making a precise regex, it's sometimes easier to combine regular expressions and "regular" code like this:
m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
try:
value_min = float(m.group('min'))
value_max = float(m.group('max'))
except (AttributeError, ValueError): # no match or failed conversion
value_min = None
value_max = None
This combined approach is especially worth remembering when your text consists of many chunks (like phrases in quotes of different types) to be processed: in tricky cases, it's harder to define a single regex to handle both delimiters and contents of chunks than to define several steps like text.split()
, optional merging of chunks, and independent processing of each chunk (using regexes and other means).