I need a RegEx that finds extraneous <br />
tags that occur before block tags, leaving all other <br />
tags intact.
Here's the text I am searching:
<div>some text<br id="first"/>some more text<br id="second"/></div>
However, when using the following RegEx:
</? *br.*?>(?=</? *([^(br)]).*?)
It selects everything past the first <br />
tag like so:
<br id="first"/>some more text<br id="second"/>
... Which isn't what I want. How can I modify the expression so it only selects <br id="second"/>
?
Notes: All inline tags except <br />
tags are stripped out before this point, so they won't be a factor. Also, I am using Obj-C/Cocoa so I can't use all those fancy PHP functions. :). Also, this will be a valid XHTML doc.
<br[^<>]*>(?=\s*<(?!br))
should do what you want. (See it here)
Explanation of the regex:
<br # Match <br
[^<>]* # followed by any number of non-bracket characters
> # and a >.
(?= # Assert that we are right before...
\s* # optional whitespace,
< # followed by any tag
(?!br) # except br
) # (End of lookahead)
Some comments:
</br>
doesn't exist in HTML or XHTML. <
and the tag name (nor may there be whitespace between /
and >
).<br />
is the only legal form; <br id="foo" />
is invalid.