Search code examples
phpdomdocumenthtml-manipulation

Add   to Non-tagged HTML Text in PHP


I have this kind of HTML document.

<span class="class1">text1</span>
<a href="">link1</a>
<font color=""><b>text2</b></font>
<a href="">link2</a>
text3
<span class="class2">text4</span>

And I'd like to surround text1, text2 and text3 by &nbsp;s. What would be the best way? DomDocument cannot catch strings that are not tagged. For text1 and text2, getElementByTagName('tagname')->item(0) can be used but for text 3, I'm not sure what to do.

Any ideas?

[Edit]

As Musa suggests, I tried using nextSibling.

<?php
$html = <<<STR
    <span class="class1">text1</span>
    <a href="">link1</a>
    <font color=""><b>text2</b></font>
    <a href="">link2</a>
    text3
    <span class="class2">text4</span>
STR;

$doc = new DOMDocument;
$doc->loadHTML($html);
foreach ($doc->getElementsByTagName('a') as $nodeA) {
    $nodeA->nextSibling->nodeValue = '&nbsp;' . $nodeA->nextSibling->nodeValue . '&nbsp;';
}
echo $doc->saveHtml();
?>

However, &nbsp;gets escaped and converted to &amp;nbsp;


Solution

  • Since the setting the value seems to set it as text and not html you could use the non-breaking space character instead of the html entity.

    <?php
    $html = <<<STR
        <span class="class1">text1</span>
        <a href="">link1</a>
        <font color=""><b>text2</b></font>
        <a href="">link2</a>
        text3
        <span class="class2">text4</span>
    STR;
    $nbsp = "\xc2\xa0";
    $doc = new DOMDocument;
    $doc->loadHTML('<div>' . $html . '</div>');
    
    foreach( $doc->getElementsByTagName('div')->item(0)->childNodes as $node ) {
        if ($node->nodeType == 3) {     // nodeType:3 TEXT_NODE
            $node->nodeValue = $nbsp . $node->nodeValue . $nbsp;
        }
    }
    echo $doc->saveHtml();
    ?>