I have some HTML that contains (among other things) p
-tags and figure
-tags that contain one img
-tag.
For the sake of simplicity I'll define an example of what can be found in the HTML here in a PHP variable:
$content = '<figure class="image image-style-align-left">
<img src="https://placekitten.com/g/200/300"></figure>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>';
I use DOMDocument to get $content
and in this example I'll change the src
attribute of all img
-elements within a figure
-element:
$dom = new DOMDocument();
libxml_use_internal_errors(true);
// this needs to be encoded otherwise special characters get messed up.
$domPart = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$dom->loadHTML($domPart, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$domFigures = $dom->getElementsByTagName('figure');
foreach ($domFigures as $domFigure) {
$img = $domFigure->getElementsByTagName('img')[0];
if ($img) {
$img->setAttribute('src', "https://placekitten.com/g/400/500");
}
}
$result = $dom->saveHTML();
The result is:
<figure class="image image-style-align-left">
<img src="https://placekitten.com/g/400/500">
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>
</figure>
Somehow my p
-element has moved into my figure
-element. Why does this happen and what can I do to prevent it?
A DomDocument
has to have a single root element, so it will move all following siblings inside the first top-level element.
You could most easily address this by bookending your content with a container tag e.g.
$content = '<div><figure class="image image-style-align-left">
<img src="https://placekitten.com/g/200/300"></figure>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p></div>';