Search code examples
pythonregexprintingproximity

Python: Convert tuples from re.findall into string?


I wish to read in a text, use regex to find all instances of a pattern, then print the matching strings. If I use the re.search() method, I can successfully grab and print the first instance of the desired pattern:

import re

text = "Cello is a yellow parakeet who sings with Lillian. Toby is a clown who doesn't sing. Willy is a Wonka. Cello is a yellow Lillian."

match = re.search(r'(cello|Cello)(\W{1,80}\w{1,60}){0,9}\W{0,20}(lillian|Lillian)', text)
print match.group()

Unfortunately, the re.search() method only finds the first instance of the desired pattern, so I substituted re.findall():

import re

text = "Cello is a yellow parakeet who sings with Lillian. Toby is a clown who doesn't sing. Willy is a Wonka. Cello is a yellow Lillian."

match = re.findall(r'(cello|Cello)(\W{1,80}\w{1,60}){0,9}\W{0,20}(lillian|Lillian)', text)
print match

This routine finds both instances of the target pattern in the sample text, but I can't find a way to print the sentences in which the patterns occur. The print function of this latter bit of code yields: ('Cello', ' with', 'Lillian'), ('Cello', ' yellow', 'Lillian'), instead of the output I desire: "Cello is a yellow parakeet who sings with Lillian. Cello is a yellow Lillian."

Is there a way to modify the second bit of code so as to obtain this desired output? I would be most grateful for any advice any can lend on this question.


Solution

  • I would just make a big capturing group around the two endpoints:

    import re
    
    text = "Cello is a yellow parakeet who sings with Lillian. Toby is a clown who doesn't sing. Willy is a Wonka. Cello is a yellow Lillian."
    
    for match in re.findall(r'(Cello(?:\W{1,80}\w{1,60}){0,9}\W{0,20}Lillian)', text, flags=re.I):
        print match
    

    Now, you get the two sentences:

    Cello is a yellow parakeet who sings with Lillian
    Cello is a yellow Lillian
    

    Some tips:

    • flags=re.I makes the regex case-insensitive, so Cello matches both cello and Cello.
    • (?:foo) is just like (foo), except that the captured text won't appear as a match. It's useful for grouping things without making them match.