Search code examples
pythonpython-re

Extracting text between quotation marks in a specific pattern


string = "'Banana' aaaa 'Melon' aaa 'Strawberry' aaaaa 'Apple' aaaa 'Mango'bbb 'Watermelon' aaaa"

What regular expression should I use to get'mango' and'watermelon' on the left and right of bbb among the words enclosed in quotation marks?

\'.+?\'

All I know is getting the words between the quotes


Solution

  • You may use re.findall twice:

    string = "'Banana' aaaa 'Melon' aaa 'Strawberry' aaaaa 'Apple' aaaa 'Mango'bbb 'Watermelon' aaaa"
    string = re.findall(r"'\S+'\s*bbb\s*'\S+'", string)[0]
    matches = re.findall(r"'(.*?)'", string)
    print(matches)  # ['Mango', 'Watermelon']
    

    The first call to re.findall isolates the string to just bbb with two singly quoted terms on either side. Then, we make a second call to re.findall to extract the quoted terms themselves.