Search code examples
phphtmltagstextnode

How can I use php to remove tags with empty text node?


How can I use php to remove tags with empty text node?

For instance,

<div class="box"></div> remove

<a href="#"></a> remove

<p><a href="#"></a></p> remove

<span style="..."></span> remove

But I want to keep the tag with text node like this,

<a href="#">link</a> keep

Edit:

I want to remove something messy like this too,

<p><strong><a href="http://xx.org.uk/dartmoor-arts"></a></strong></p>
<p><strong><a href="http://xx.org.uk/depw"></a></strong></p>
<p><strong><a href="http://xx.org.uk/devon-guild-of-craftsmen"></a></strong></p>

I tested both regex below,

$content = preg_replace('!<(.*?)[^>]*>\s*</\1>!','',$content);
$content = preg_replace('%<(.*?)[^>]*>\\s*</\\1>%', '', $content);

But they leave something like this,

<p><strong></strong></p>
<p><strong></strong></p>
<p><strong></strong></p>

Solution

  • One way could be:

    $dom = new DOMDocument();
    $dom->loadHtml(
        '<p><strong><a href="http://xx.org.uk/dartmoor-arts">test</a></strong></p>
        <p><strong><a href="http://xx.org.uk/depw"></a></strong></p>
        <p><strong><a href="http://xx.org.uk/devon-guild-of-craftsmen"></a></strong></p>'
    );
    
    $xpath = new DOMXPath($dom);
    
    while(($nodeList = $xpath->query('//*[not(text()) and not(node())]')) && $nodeList->length > 0) {
        foreach ($nodeList as $node) {
            $node->parentNode->removeChild($node);
        }
    }
    
    echo $dom->saveHtml();
    

    Probably you'll have to change that a bit for your needs.