Search code examples
phpdomdocumentgetelementbyid

PHP DOMDocument::getElementById fails with DOMDocumentFragment


I just want to know why this fails. This is the testcase:

<?php
error_reporting(E_ALL);
ini_set('display_errors', '1');

$doc = new DOMDocument();
$doc->loadHTML("<!DOCTYPE html><html><body><div id='testId'>Test</div></body></html>");
echo "This works: ".$doc->getElementById('testId')->nodeValue.'<br/>';

$fragment = $doc->createDocumentFragment();
$fragment->appendXML("<p id='testId2'>Test 2</p>");
$doc->getElementById('testId')->appendChild($fragment);

echo "This still works: ".$doc->getElementById('testId')->nodeValue.'<br/>';
echo "This doesn't work: ".$doc->getElementById('testId2')->nodeValue.'<br/>';

The workaround is to use

$xpath = new \DOMXpath($doc);
$nodes = $xpath->query('//*[@id="testId2"]')[0]->nodeValue;

Solution

  • The original DOM 2.0 specification says:

    getElementById introduced in DOM Level 2

    Returns the Element whose ID is given by elementId. If no such element exists, returns null. Behavior is not defined if more than one element has this ID.

    Note: The DOM implementation must have information that says which attributes are of type ID. Attributes with the name "ID" are not of type ID unless so defined. Implementations that do not know whether attributes are of type ID or not are expected to return null.

    The important part of that is Attributes with the name "ID" are not of type ID unless so defined.

    When you're working with HTML, the built-in DTD defines "id" as the element ID attribute:

    <!ENTITY % coreattrs
     "id          ID             #IMPLIED  -- document-wide unique id --
      class       CDATA          #IMPLIED  -- space-separated list of classes --
      style       %StyleSheet;   #IMPLIED  -- associated style info --
      title       %Text;         #IMPLIED  -- advisory title --"
      >
    

    However, when you append elements to a document fragment using DomDocumentFragment::appendXML() you are using raw XML, which has no such DTD. (This doesn't really seem intuitive, since you've appended it to an HTML document, but the whole DomDocument API is far from intuitive!)

    PHP does address this in the documentation for DomDocument::createGetElementById():

    For this function to work, you will need either to set some ID attributes with DOMElement::setIdAttribute or a DTD which defines an attribute to be of type ID.

    So, the solution is to simply tell the parser that the id attribute is, in fact, the ID attribute:

    $doc = new DOMDocument();
    $doc->loadHTML("<!DOCTYPE html><html><body><div id='testId'>Test</div></body></html>");
    echo "This works: ".$doc->getElementById('testId')->nodeValue.'<br/>';
    
    $fragment = $doc->createDocumentFragment();
    $fragment->appendXML("<p id='testId2'>Test 2</p>");
    // here's the magic
    $fragment->childNodes[0]->setIdAttribute("id", true);
    $doc->getElementById('testId')->appendChild($fragment);
    
    echo "This still works: ".$doc->getElementById('testId')->nodeValue.'<br/>';
    echo "And so does this: ".$doc->getElementById('testId2')->nodeValue.'<br/>';