Search code examples
regexpython-2.7textquotes

Extracting quotes between apostrophes in Python


I'm working on a regex to capture the text in quotes. It works, however the plain text which is the source file, has converted single smart quotes to apostrophes.

For the regex I have:

r("[\"|\'|\`].+[\"|\'|\`]")

The regex works fine but also grabs text between two apostrophes as well. Is is possible to adjust the regex so it doesn't do this?

"Come up and see me some time" # correct
'Yeah, I wonder if will pick this up to' #correct
`Mmmm. I wonder...` # correct
"Sorry about the mess!" #correct
We don't know who is human. Don't we? # Wrong. 

The last one grabs

't know who is human. Don'

Solution

  • I would also recommend non-word boundaries (\B) like @Wiktor commented, but also use a backreference (\1) to match the same quote character as the starting quote character:

    regex = r"\B([\"'`]).+?\1\B"
    

    test it here https://regex101.com/r/TOLYVc/3