Search code examples
phpxpathdomdocumentxpathquery

How to delete all nodes from DOMDocument except custom ones?


I have a DOMDocument in PHP and I'm trying to delete all nodes except of a container with a specific ID.

Lets say I have the following DOM Document:

<section>
  <div id="first-section">
    <ul>
      <li>Test</li>
      <li>Test</li>
    </ul>
  </div>
  <div id="second-section">
    <ul>
      <li>Test</li>
      <li>Test</li>
    </ul>
    <div id="sub-section">
      <h2>Hello World</h2>
    </div>
  </div>
  <div id="third-section">
    <ul>
      <li>Test</li>
      <li>Test</li>
    </ul>
  </div>
</section>

My PHP Code:

$domDocument = $this->domParser->loadHTML($markup);

$xpath = new \DOMXPath($domDocument);
$nlist = $xpath->query("//*[@id='sub-section']");

$domDocument->saveHTML();

With this code I query the correct container. But how could I remove all nodes except this node from my document, so that in the end I have the following nodes:

<div id="sub-section">
    <h2>Hello World</h2>
</div>

What I tried

I tried to go the reversed way with a query like this: "/*/*[not(@id='test')]" But it works not fine for nested HTML structures. Sometimes, depending on the structure, it removes all nodes.

Whats the way to go here?


Solution

  • That logic is strange. How do you know then what to keep? What in a nested case?

    I would pick the ones I need and copy to a new document.

    Clone a node to a new document

    $xml = <<<'_XML'
    <section>
      <div id="first-section">
        <ul>
          <li>Test</li>
          <li>Test</li>
        </ul>
      </div>
      <div id="second-section">
        <ul>
          <li>Test</li>
          <li>Test</li>
        </ul>
        <div id="sub-section">
          <h2>Hello World</h2>
        </div>
      </div>
      <div id="third-section">
        <ul>
          <li>Test</li>
          <li>Test</li>
        </ul>
      </div>
    </section>
    _XML;
    
    libxml_use_internal_errors(true);
    $doc = new DOMDocument();
    $doc->loadHTML($xml);
    
    $newDoc = new DOMDocument();
    $newDoc->appendChild($newDoc->importNode($doc->getElementById('sub-section'), true));
    
    echo $newDoc->saveHTML();
    

    Extract only one node

    When you only need just one node, you can even easier go with

    libxml_use_internal_errors(true);
    $doc = new DOMDocument();
    $doc->loadHTML($xml);
    echo $doc->saveHTML($doc->getElementById('sub-section'));
    

    Output

    The same output with both examples.

    <div id="sub-section">
          <h2>Hello World</h2>
        </div>
    

    Demo

    https://3v4l.org/ttTS6