Having:
$content=
'<div id="parent">
<div class="children">
This is short content
</div>
<div class="children">
This is a very long content even longer than the Short content
</div>
<p>
This is a Short content in a paragraph
</p>
This is a Short content without a html elemnt
</div>';
I can remove nodes using DOMDocument by class
(or id
) like this:
$dom->loadHTML($content);
$xpath = new DOMXpath($dom);
if($divToRemove = $xpath->query('.//div[@class="children"]')->item(0))
$divToRemove->parentNode->removeChild($divToRemove);
$content = $dom->saveHTML();
Using above code, I can remove the first div
from $content
. but How can I remove childs that have a short inner text, for example shorter than 20 characters?
EDIT
I have no idea about the child element. It can be a <div>
or a <p>
or something else.
I want to remove every short-length child of parent <div>
Is there any Xpath
query to select nodes regarding their length?
This is what I wantas output:
$content=
'<div id="parent">
<div class="children">
This is a very long content even longer than the Short content
</div>
</div>';
The div
and p
element nodes are not the nodes with the strings. This are always text nodes. However nodes can be cast to strings in Xpath. Here are two string functions that are needed.
string-length()
Returns the character length of a string. If a node list is provided, the first node of the list is cast into a string.
normalize-space()
Converts all whitespaces groups in a string to single spaces and strips them from start and end.
But first get some context:
$context = $xpath->evaluate('//div[@id = "parent"]')->item(0);
Now build an expression for nodes with sort content:
All kind of nodes, elements, text nodes, comments, ...
node()
... with a string length less then or equal to 50 after normalizing whitespaces:
node()[string-length(normalize-space(.)) <= 50]
Put together:
$dom = new DOMDocument();
$dom->loadHtml($content);
$xpath = new DOMXPath($dom);
$context = $xpath->evaluate('//div[@id = "parent"]')->item(0);
$maxLength = 50;
$expression = 'node()[string-length(normalize-space(.)) <= '.$maxLength.']';
foreach ($xpath->evaluate($expression, $context) as $node) {
$node->parentNode->removeChild($node);
}
echo $dom->saveHtml($context);
Output:
<div id="parent"><div class="children">
This is a very long content even longer than the Short content
</div></div>
The context is used to save only the original div
as HTML. DOMDocument::loadHtml()
will add html
and body
elements.
It does not make a difference for this example but I suggest using DOMXpath::evaluate()
for all Xpath expressions. DOMXpath::query()
does not support Xpath expression that return scalar values. See: https://stackoverflow.com/a/23796070/2265374