I wish to read in a text, use regex to find all instances of a pattern, then print the matching strings. If I use the re.search() method, I can successfully grab and print the first instance of the desired pattern:
import re
text = "Cello is a yellow parakeet who sings with Lillian. Toby is a clown who doesn't sing. Willy is a Wonka. Cello is a yellow Lillian."
match = re.search(r'(cello|Cello)(\W{1,80}\w{1,60}){0,9}\W{0,20}(lillian|Lillian)', text)
print match.group()
Unfortunately, the re.search() method only finds the first instance of the desired pattern, so I substituted re.findall():
import re
text = "Cello is a yellow parakeet who sings with Lillian. Toby is a clown who doesn't sing. Willy is a Wonka. Cello is a yellow Lillian."
match = re.findall(r'(cello|Cello)(\W{1,80}\w{1,60}){0,9}\W{0,20}(lillian|Lillian)', text)
print match
This routine finds both instances of the target pattern in the sample text, but I can't find a way to print the sentences in which the patterns occur. The print function of this latter bit of code yields: ('Cello', ' with', 'Lillian'), ('Cello', ' yellow', 'Lillian'), instead of the output I desire: "Cello is a yellow parakeet who sings with Lillian. Cello is a yellow Lillian."
Is there a way to modify the second bit of code so as to obtain this desired output? I would be most grateful for any advice any can lend on this question.
I would just make a big capturing group around the two endpoints:
import re
text = "Cello is a yellow parakeet who sings with Lillian. Toby is a clown who doesn't sing. Willy is a Wonka. Cello is a yellow Lillian."
for match in re.findall(r'(Cello(?:\W{1,80}\w{1,60}){0,9}\W{0,20}Lillian)', text, flags=re.I):
print match
Now, you get the two sentences:
Cello is a yellow parakeet who sings with Lillian
Cello is a yellow Lillian
Some tips:
flags=re.I
makes the regex case-insensitive, so Cello
matches both cello
and Cello
.(?:foo)
is just like (foo)
, except that the captured text won't appear as a match. It's useful for grouping things without making them match.