I want to remove all script
element and here the code
<?php
$pageFile = <<<EOF
<!DOCTYPE html><html><body>
<script src="aa"></script>
<script src="bb"></script>
<script src="cc"></script>
<div>aaa</div>
</body></html>
EOF;
$dom = new DOMDocument();
$dom->loadHTML($pageFile);
foreach ($dom->getElementsByTagName('script') as $item) {
$item->parentNode->removeChild($item);
}
$pageFile = $dom->saveHTML();
echo $pageFile;
but there still 1 script
element exist. You can try it online here
Result:
<!DOCTYPE html>
<html><body>
<script src="bb"></script><div>aaa</div>
</body></html>
The DOMNodeList
returned by $dom->getElementsByTagName
is "live". So when you remove a script, it's removed from the node list, and all the elements of the list shift their indexes down. Then the for
loop goes to the next index, and it ends up skipping every other element.
Convert the node list to an array first.
foreach (iterator_to_array($dom->getElementsByTagName('script')) as $item) {
$item->parentNode->removeChild($item);
}