Search code examples
pythonregexcalibre

Regex to match expression followed by lower case character


I want to match a closing tag followed by an 0+ spaces/newlines followed by an opening tag when followed by a lowercase letter. Examples:

  • text</p> <p>blah matches </p> <p>
  • text</i><i>and more text <b>but not this</b> matches </i><i>
  • text</i> <i>And more text does not match

I tried this: </.*?>\s*\n*\s*<.*>(?=[a-z]), but it doesn't work for the second example, as it will match </i><i> and more text </b> even though the question mark should make it "lazy".


Solution

  • Try:

    </[^>]+>\s*<[^/>]+>(?=[a-z])
    

    Change the '+' to '*' if you want to be able to match empty tags