I just want to know why this fails. This is the testcase:
<?php
error_reporting(E_ALL);
ini_set('display_errors', '1');
$doc = new DOMDocument();
$doc->loadHTML("<!DOCTYPE html><html><body><div id='testId'>Test</div></body></html>");
echo "This works: ".$doc->getElementById('testId')->nodeValue.'<br/>';
$fragment = $doc->createDocumentFragment();
$fragment->appendXML("<p id='testId2'>Test 2</p>");
$doc->getElementById('testId')->appendChild($fragment);
echo "This still works: ".$doc->getElementById('testId')->nodeValue.'<br/>';
echo "This doesn't work: ".$doc->getElementById('testId2')->nodeValue.'<br/>';
The workaround is to use
$xpath = new \DOMXpath($doc);
$nodes = $xpath->query('//*[@id="testId2"]')[0]->nodeValue;
The original DOM 2.0 specification says:
getElementById
introduced in DOM Level 2Returns the Element whose ID is given by
elementId
. If no such element exists, returnsnull
. Behavior is not defined if more than one element has this ID.Note: The DOM implementation must have information that says which attributes are of type ID. Attributes with the name "ID" are not of type ID unless so defined. Implementations that do not know whether attributes are of type ID or not are expected to return
null
.
The important part of that is Attributes with the name "ID" are not of type ID unless so defined.
When you're working with HTML, the built-in DTD defines "id" as the element ID attribute:
<!ENTITY % coreattrs
"id ID #IMPLIED -- document-wide unique id --
class CDATA #IMPLIED -- space-separated list of classes --
style %StyleSheet; #IMPLIED -- associated style info --
title %Text; #IMPLIED -- advisory title --"
>
However, when you append elements to a document fragment using DomDocumentFragment::appendXML()
you are using raw XML, which has no such DTD. (This doesn't really seem intuitive, since you've appended it to an HTML document, but the whole DomDocument API is far from intuitive!)
PHP does address this in the documentation for DomDocument::createGetElementById()
:
For this function to work, you will need either to set some ID attributes with
DOMElement::setIdAttribute
or a DTD which defines an attribute to be of type ID.
So, the solution is to simply tell the parser that the id
attribute is, in fact, the ID attribute:
$doc = new DOMDocument();
$doc->loadHTML("<!DOCTYPE html><html><body><div id='testId'>Test</div></body></html>");
echo "This works: ".$doc->getElementById('testId')->nodeValue.'<br/>';
$fragment = $doc->createDocumentFragment();
$fragment->appendXML("<p id='testId2'>Test 2</p>");
// here's the magic
$fragment->childNodes[0]->setIdAttribute("id", true);
$doc->getElementById('testId')->appendChild($fragment);
echo "This still works: ".$doc->getElementById('testId')->nodeValue.'<br/>';
echo "And so does this: ".$doc->getElementById('testId2')->nodeValue.'<br/>';