Search code examples

How to remove in PHP outer tags from a node

I have the following html code:

$pageHTML = '<html>
<div class="some class">

and I need to remove outer tags of the <div> keeping all its inner HTML inside of the <body>

If I try

$dom = new DOMDocument;

$bodyDivs = [];
foreach($dom->getElementsByTagName('body')[0]->childNodes as $bodyChild) {
    if($bodyChild->nodeName == 'div') {
        $bodyDivs[] = $bodyChild;

if(count($bodyDivs) == 1) {
    foreach($bodyDivs[0]->childNodes as $divChild) {

the div is being removed but without appending its childs to <body> before the removing

If I try a reverse loop like

$k = count($bodyDivs[0]->childNodes);
for($n = $k-1; $n >= 0; $n--) {

the childs are being added to the body, but in reverse order

So I get


but I need


How to resolve the problem?


  • Your original code is very close, just missing one key point.

    Original code

    foreach($bodyDivs[0]->childNodes as $divChild) {

    Trying to foreach a list of nodes, while also removing nodes from that same list (in your case, moving them to the <body>), does not behave as you intended.

    Simplified, complete example for demonstration purposes:

    $doc = new DOMDocument;
    $parent = $doc->documentElement;
    foreach ($parent->childNodes as $child) {
    echo $doc->saveXML();

    This outputs the following:

    <?xml version="1.0"?>

    Totally sensible, right?! Fear not, we can do better.

    What to do?

    A common approach, that does behave as intended, is to loop over the list until it is empty.

    $doc = new DOMDocument;
    $parent = $doc->documentElement;
    while ($parent->childNodes->length > 0) {
        $child = $parent->childNodes->item(0);
    echo $doc->saveXML();

    Applied to your code

    All of the above means that your original foreach:

    foreach($bodyDivs[0]->childNodes as $divChild) {

    Can be replaced with a while loop.

    while ($bodyDivs[0]->childNodes->length > 0) {
        $divChild = $bodyDivs[0]->childNodes->item(0);

    Aside: I used the ->item(0) notation above, as that's more conventional.