I'm changing a database using phpmyadmin with several html pages inside it and I would like to remove, from all these pages, all the <div>
and other tags that contain a certain class
or id
.
Example:
Case 1
<div class="undesirable">
<div class="container">
<div class="row">
<div class="col1"></div>
</div>
</div>
</div>
Case 2
<div class="undesirable">
<div class="container">
<div class="row">
<div class="col1"></div>
<div class="col2"></div>
</div>
</div>
</div>
i would like to remove all <div>
that contain the class="undesirable"
. In some cases, there is still the possibility of appearing as class="pre_undesirable"
, or something similar.
Initially I thought of using regex
, but as there are variations in htmls, code breaks are occurring, as there is no way to know when the <\div>
will end.
Possibly the answer would be HTML parser, but I can't understand how to use it. Any indication of where to start?
Since you are dealing with html, you probably should use an html parser and search for the removal target using xpath. To demonstrate, I'll change your html a bit:
$original=
'<html><body>
<div class="undesirable">
<div class="container">
<div class="row">
<div class="col1"></div>
</div>
</div>
</div>
<div class="keepme">
<div class="container">
<div class="row">
<div class="col1"></div>
<div class="col2"></div>
</div>
</div>
</div>
<div class="pre_undesirable">
<div class="container">
<div class="row">
<div class="col1"></div>
<div class="col2"></div>
</div>
</div>
</div>
<div class="keepme">
<div class="container">
<div class="row">
<div class="col1"></div>
<div class="col2"></div>
</div>
</div>
</div>
</body>
</html>
';
$HTMLDoc = new DOMDocument();
$HTMLDoc->loadHTML($original);
$xpath = new DOMXPath($HTMLDoc);
$targets = $xpath->query('//div[contains(@class,"undesirable")]');
foreach($targets as $target){
$target->parentNode->removeChild($target);
}
echo $HTMLDoc->saveHTML();
The output should include only the two "keep me" <div>
s.