I'm trying to parse a sentence (or line of text) where you have a sentence and optionally followed some key/val pairs on the same line. Not only are the key/value pairs optional, they are dynamic. I'm looking for a result to be something like:
Input:
"There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
Output:
Values = {'theSentence' : "There was a cow at home.",
'home' : "mary",
'cowname' : "betsy",
'date'= "10-jan-2013"
}
Input:
"Mike ordered a large hamburger. lastname=Smith store=burgerville"
Output:
Values = {'theSentence' : "Mike ordered a large hamburger.",
'lastname' : "Smith",
'store' : "burgerville"
}
Input:
"Sam is nice."
Output:
Values = {'theSentence' : "Sam is nice."}
Thanks for any input/direction. I know the sentences appear that this is a homework problem, but I'm just a python newbie. I know it's probably a regex solution, but I'm not the best regarding regex.
I'd use re.sub
:
import re
s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
d = {}
def add(m):
d[m.group(1)] = m.group(2)
s = re.sub(r'(\w+)=(\S+)', add, s)
d['theSentence'] = s.strip()
print d
Here's more compact version if you prefer:
d = {}
d['theSentence'] = re.sub(r'(\w+)=(\S+)',
lambda m: d.setdefault(m.group(1), m.group(2)) and '',
s).strip()
Or, maybe, findall
is a better option:
rx = '(\w+)=(\S+)|(\S.+?)(?=\w+=|$)'
d = {
a or 'theSentence': (b or c).strip()
for a, b, c in re.findall(rx, s)
}
print d