I have a string containing multiple <html><body><div>Content</div></body></html>
Tags. I want to get all Contents an join them to one valid Structure. For example:
<html><body><div>Content</div></body></html>
<html><body><div>Content</div></body></html>
<html><body><div>Content</div></body></html>
Should be:
<html>
<body>
<div>Content</div>
<div>Content</div>
<div>Content</div>
</body>
</html>
My current Code looks like this:
libxml_use_internal_errors(true);
$newDom = new DOMDocument();
$newBody = "";
$newDom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
$bodyTags = $newDom->getElementsByTagName("body");
foreach($bodyTags as $body) {
$newBody .= $newDom->saveHTML($body);
}
$newBody
now contains all body Tags:
<body><div>Content</div></body>
<body><div>Content</div></body>
<body><div>Content</div></body>
How can I only save the HTML Content of each body Tag in $newBody
?
Edit:
Based on @NigelRen s Answer this is my Solution:
libxml_use_internal_errors(true);
$newDom = new DOMDocument();
$newBody = '';
$newDom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
$bodyTags = $newDom->getElementsByTagName("body");
foreach($bodyTags as $body) {
foreach ($body->childNodes as $node) {
$newBody .= $newDom->saveHTML($node);
}
}
$newDom = new DOMDocument();
$newDom->loadHTML(mb_convert_encoding($newBody, 'HTML-ENTITIES', 'UTF-8'));
$newBody = $newDom->saveHTML();
It's awkward as when you use loadHTML()
it will attempt to fix the HTML in your original document. This creates a structure which isn't what you might think it is.
BUT, if you have a basic outline of the document, the following will copy the contents of the <body>
tags to a new document (comments in code)...
$html = '<html><body><div>Content1</div></body></html>
<html><body><div>Content2</div></body></html>
<html><body><div>Content3</div></body></html>';
libxml_use_internal_errors(true);
$newDom = new DOMDocument();
// New document with final code
$newBody = new DOMDocument();
$newDom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
// Set up basic template for new doucument
$newBody->loadHTML("<html><body /></html>");
// Find where to add any new content
$addBody = $newBody->getElementsByTagName("body")[0];
// Find the existing content to add
$bodyTags = $newDom->getElementsByTagName("body");
foreach($bodyTags as $body) {
// Add all of the contents of the <body> tag into the new document
foreach ( $body->childNodes as $node ) {
// Import the node to copy to the new document and add it in
$addBody->appendChild($newBody->importNode($node, true));
}
}
echo $newBody->saveHTML();
which gives...
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div>Content1</div><div>Content2</div><div>Content3</div></body></html>
The limitations are that any content outside of the <body>
tags and any attributes of the <body>
tag are not preserved.