Search code examples
phphtmlregexxpathdomxpath

How can I remove <br/> if no text comes before or after it? DOMxpath or regex?


How can I remove <br/> if no text comes before or after it?

For instance,

<p><br/>hello</p>
<p>hello<br/></p>

they should be rewritten like this,

<p>hello</p>
<p>hello</p>

Should I use DOMxpath or regex would be better?

(Note: I have a post about removing <p><br/></p> with DOMxpath earlier, and then I came across this issue!)

EDIT:

If I have this in the input,

$content = '<p><br/>hello<br/>hello<br/></p>';

then it should be

<p>hello<br/>hello</p>'

Solution

  • To select the mentioned br you can use:

     "//p[node()[1][self::br]]/br[1] | //p[node()[last()][self::br]]/br[last()]"
    

    or, (maybe) faster:

     "//p[br]/node()[self::br and (position()=1 or position()=last())]"
    

    Just getting the br when the first (or last) node of p is br.

    This will select br such as:

    <p><br/>hello</p>
    <p>hello<br/></p>
    

    and first and last br like in:

    <p><br/>hello<br/>hello<br/></p>
    

    not middle br like in:

    <p>hello<br/>hello</p>
    

    PS: to get eventually the first br in a pair like this <br/><br/>:

    "//br[following::node()[1][self::br]]"