How can I use php to remove tags with empty text node?
For instance,
<div class="box"></div>
remove
<a href="#"></a>
remove
<p><a href="#"></a></p>
remove
<span style="..."></span>
remove
But I want to keep the tag with text node like this,
<a href="#">link</a>
keep
Edit:
I want to remove something messy like this too,
<p><strong><a href="http://xx.org.uk/dartmoor-arts"></a></strong></p>
<p><strong><a href="http://xx.org.uk/depw"></a></strong></p>
<p><strong><a href="http://xx.org.uk/devon-guild-of-craftsmen"></a></strong></p>
I tested both regex below,
$content = preg_replace('!<(.*?)[^>]*>\s*</\1>!','',$content);
$content = preg_replace('%<(.*?)[^>]*>\\s*</\\1>%', '', $content);
But they leave something like this,
<p><strong></strong></p>
<p><strong></strong></p>
<p><strong></strong></p>
One way could be:
$dom = new DOMDocument();
$dom->loadHtml(
'<p><strong><a href="http://xx.org.uk/dartmoor-arts">test</a></strong></p>
<p><strong><a href="http://xx.org.uk/depw"></a></strong></p>
<p><strong><a href="http://xx.org.uk/devon-guild-of-craftsmen"></a></strong></p>'
);
$xpath = new DOMXPath($dom);
while(($nodeList = $xpath->query('//*[not(text()) and not(node())]')) && $nodeList->length > 0) {
foreach ($nodeList as $node) {
$node->parentNode->removeChild($node);
}
}
echo $dom->saveHtml();
Probably you'll have to change that a bit for your needs.