Search code examples
pythonregextext-parsing

Regex find all p tags inside quotes without other text


I'm trying to edit a poorly translated book.

I have text with lots of unnecessary p tags in it. I want to find all p tags inside quotes as in this piece : “…Hmm. </p>Is… That, really so…?” and to remove those p tags.

I managed to come up with a regex to find such sentences (\“.*</p>.*\”), but I can't imagine one to select only </p> inside quotes without other text so that I can just replace them all in a single click.

Send help please.

edit1: changed all words "brackets" to "quotes".


Solution

  • (?<=\“(.|\n)*)(<\/p>)(?=(.|\n)*\”)