Search code examples
phphtmldomdocument

Why does this DOMDocument code not work when trying to rearrange elements?


I am trying to get HTML in this pattern...

<p>a</p>
<p>b</p>
<p>c</p>
...
<h3>title</h3>

<p>e</p>
<p>e</p>
<p>f</p>
...
<h3>title2</h3>
...

...and turn it into...

<ul>
  <li>
     <blockquote>
        <p>a</p>
        <p>b</p>
        <p>c</p>
        <cite>title</cite>
     </blockquote>
  </li>
  <li>
     <blockquote>
        <p>d</p>
        <p>e</p>
        <p>f</p>
        <cite>title2</cite>
     </blockquote>
  </li>
</ul>

The PHP code I have is...

$dom = new DOMDocument('1.0', 'utf-8');

$dom->preserveWhiteSpace = FALSE;

$dom->loadHTML($content);

$ul = $dom->createElement('ul');

$body = $dom->getElementsByTagName('body')->item(0);

$blockquote = FALSE;

foreach($body->childNodes as $element) {

    if ($element->nodeType != XML_ELEMENT_NODE) {
        continue;
    }

    if ( ! $blockquote) {
        $blockquote = $dom->createElement('blockquote');
        $li = $dom->createElement('li');
    }

    switch ($element->nodeName) {

        case 'p':
            $blockquote->appendChild($element);

            break;
        case 'h3':
            $li->appendChild($blockquote);

            $ul->appendChild($li);

            $blockquote = $li = FALSE;
            break;

    }
}

$body->appendChild($ul); 
echo $dom->saveHTML();

Whilst the functionality was not finished, I noticed that the loop stopped when I added $blockquote->appendChild($element).

If I remove all the appendChild stuff, the loop works fine.

My guess is that by moving the current element in the iteration, it breaks the loop.

How would I get this to work?


Solution

  • If it's choking when inserting into the current document, have you considered creating a new document instead? You can use the importNode method on DOMDocument to copy just the correct bits across from the old document into the right structure you'll create in the new document.

    You might be able to use a similar trick using a document fragment, only using serialized XML instead of working with node objects. This could work for the paragraph tags, at least.