I want to extract words in quotation marks if they are one or two words long. This works with the following Code.
mysentences = ['Kids, you "tried" your "best" and you failed miserably. The "lesson" is, "never try."',
"Just because I don’t 'care' doesn’t mean I don’t understand."]
quotation = []
rx = r'"((?:\w+[ .]*){1,2})"'
for sentence in mysentences:
quotation.append(re.findall(rx, sentence))
print(quotation)
But this doesn't get me the 'care' from the sencond sentence because the second sentence is in double quotation marks. I can get it with the following
r"'((?:\w+[ .]*){1,2})'"
The Question is, how can I join the conditions? with
rx = r'"((?:\w+[ .]*){1,2})"' or r"'((?:\w+[ .]*){1,2})'"
it only gets me the first mentioned condition.
Using your current pattern, you could make use of a capturing group and a backreference \1
to match the accompanying single or double quote.
The match will now be in the second capturing group.
(['"])((?:\w+[ .]*){1,2})\1
Note that repeating the character class [ .]*
could potentially also match for examplenever try... ....
If you want to match 1 or 2 words where there can be a single optional dot at the end, you could match 1+ word chars followed by an optional group to match 1+ spaces and 1+ word chars followed by an optional dot.
(['"])(\w+(?: +\w+)?\.?)\1
For example
import re
mysentences = ['Kids, you "tried" your "best" and you failed miserably. The "lesson" is, "never try."',
"Just because I don’t 'care' doesn’t mean I don’t understand."]
quotation = []
rx = r"(['\"])((?:\w+[ .]*){1,2})\1"
for sentence in mysentences:
for m in re.findall(rx, sentence):
quotation.append(m[1])
print(quotation)
Result
['tried', 'best', 'lesson', 'never try.', 'care']