I am trying to find orphan '
that exists between <
and >
, whether in the same line or in the closing >
next line or after.
I am a bit new to this, I tried lazy search like <.*?'.*>
, but I can't get it to work.
Or a different way to search could be to find lines with any odd number of '
between < >
.
So on grepWin or NP++ it should match lines like:
<p class="quote" style=' ; dir='ltr'>
But not:
<p class="quote" style='indent' ; dir='ltr'>
You could use this regex to match those tags:
<(?!(?:[^'">]|'[^']*'|"[^"]*")+>)[^>]*>
It matches:
<
: literal <
(?!
: a negative lookahead for(?:[^'"]|'[^']*'|"[^"]*")+>
: one or more of
[^'">]
: a character which is not a single or double quote or a >
'[^']*'
: a single quoted string"[^"]*"
: a double quoted string[^>]*>
: some number of not >
characters, followed by a >
The negative lookahead looks for a properly formed tag, where all quotes are balanced. The last part of the regex then matches to the next >
after the <
which should match the malformed tag.
Limitations:
>
inside a properly balanced pair of quotes, the regex will only match as far as that.<
inside a pair of quotes, this regex may match from that point.Regex demo on regex101