Search code examples
pythonregexquotation-marks

How to simultaniously search for two possible quotation marks with regular expressions?


I want to extract words in quotation marks if they are one or two words long. This works with the following Code.

mysentences = ['Kids, you "tried" your "best" and you failed miserably. The "lesson" is, "never try."', 
               "Just because I don’t 'care' doesn’t mean I don’t understand."]
quotation = []
rx = r'"((?:\w+[ .]*){1,2})"' 
for sentence in mysentences:
    quotation.append(re.findall(rx, sentence))
print(quotation)

But this doesn't get me the 'care' from the sencond sentence because the second sentence is in double quotation marks. I can get it with the following

r"'((?:\w+[ .]*){1,2})'"

The Question is, how can I join the conditions? with

rx = r'"((?:\w+[ .]*){1,2})"' or r"'((?:\w+[ .]*){1,2})'"

it only gets me the first mentioned condition.


Solution

  • Using your current pattern, you could make use of a capturing group and a backreference \1 to match the accompanying single or double quote.

    The match will now be in the second capturing group.

    (['"])((?:\w+[ .]*){1,2})\1
    

    Regex demo

    Note that repeating the character class [ .]* could potentially also match for examplenever try... ....

    If you want to match 1 or 2 words where there can be a single optional dot at the end, you could match 1+ word chars followed by an optional group to match 1+ spaces and 1+ word chars followed by an optional dot.

    (['"])(\w+(?: +\w+)?\.?)\1
    

    Regex demo

    For example

    import re
    mysentences = ['Kids, you "tried" your "best" and you failed miserably. The "lesson" is, "never try."',
                   "Just because I don’t 'care' doesn’t mean I don’t understand."]
    quotation = []
    rx = r"(['\"])((?:\w+[ .]*){1,2})\1"
    for sentence in mysentences:
        for m in  re.findall(rx, sentence):
            quotation.append(m[1])
    
    print(quotation)
    

    Result

    ['tried', 'best', 'lesson', 'never try.', 'care']