Search code examples
regexxmlnotepad++

Regex to find and fix unmatched xml closing tags in notepad++


I'm trying to simplify the process of correcting missing unmatched verse tags in an xml file that looks like this:

    <verse number="21">words words words asdlkjf alsdf. </verse>
    <verse number="22">words words words arbitrary words. 
      <verse number="23">more arbitrary text.</verse>
      <verse number="23">other arbitrary words. </chapter>

I would like to use a regex in notepad++ to find the end of a line that starts with an arbitrary number of spaces and <verse but does not end with </verse>

With the end of the line matched, I should be able use notepad++ find/replace to add the missing tag back in.

Here is what I have so far, which matches every line (the whole line, unfortunately) that starts with spaces and <verse

^( +<verse).*

Solution

  • This could be what you look for:

    Find: (^\h+<verse(?!.*verse>\h*).*?)((</.*?>\h*)*)$
    Replace: $1</verse>$2

    Given the sample data it will make two replacements, with this result:

        <verse number="21">words words words asdlkjf alsdf. </verse>
        <verse number="22">words words words arbitrary words. </verse>
          <verse number="23">more arbitrary text.</verse>
          <verse number="23">other arbitrary words. </verse></chapter>