Search code examples
phphtmldomdocument

DOMDocument appendXML with special characters


I am retreiving some html strings from my database and I would like to parse these strings into my DOMDocument. The problem is, that the DOMDocument gives warnings at special characters.

Warning: DOMDocumentFragment::appendXML() [domdocumentfragment.appendxml]: Entity: line 2: parser error : Entity 'nbsp' not defined in page.php on line 189

I wonder why and I wonder how to solve this. This are some code fragments of my page. How can I fix these kind of warnings?

$doc = new DOMDocument();

// .. create some elements first, like some divs and a h1 ..

while($row = mysql_fetch_array($result))
{
    $messageEl = $doc->createDocumentFragment();
    $messageEl->appendXML($row['message']); // gives it's warnings here!

    $otherElement->appendChild($messageEl);
}

echo $doc->saveHTML();

I also found something about validation, but when I apply that, my page won't load anymore. The code I tried for that was something like this.

$implementation = new DOMImplementation();
$dtd = $implementation->createDocumentType('html','-//W3C//DTD XHTML 1.0 Transitional//EN','http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd');

$doc = $implementation->createDocument('','',$dtd);
$doc->validateOnParse = true;
$doc->formatOutput = true;

// in the same whileloop, I used the following:
$messageEl = $doc->createDocumentFragment();
$doc->validate(); // which stopped my code, but error- and warningless.
$messageEl->appendXml($row['message']);

Thanks in advance!


Solution

  • There is no   in XML. The only character entities that have an actual name defined (instead of using a numeric reference) are &, <, >, " and '.

    That means you have to use the numeric equivalent of a non-breaking space, which is   or (in hex)  .

    If you are trying to save HTML into an XML container, then save it as text. HTML and XML may look similar but they are very distinct. appendXML() expects well-formed XML as an argument. Use the nodeValue property instead, it will XML-encode your HTML string without any warnings.

    // document fragment is completely unnecessary
    $otherElement->nodeValue = $row['message'];