Search code examples
phpxmlxhtmlencode

xml add encoded xhtml in xml element using php


I want to create xml file which embed encoded xhtml. I has encoded xhtml file separately. During creating xml element, I would like to add the encoded content of xhtml in xml element, test. After I add and echo the final output to browser, error shown in browser.

This page contains the following errors: error on line 9 at column 144: Encoding error Below is a rendering of the page up to the first error.

    <?php    
     $dom                   =new DOMDocument('1.0','utf-8');
     $content = (file_get_contents("test_xmlencoding.xhtml"));
     $element = $dom->createElement('test', $content);
     $dom->appendChild($element);
     header('Content-type: text/xml;');
     echo $dom->saveXML();    
    ?>

XHTML file

&lt;?xml version="1.0" ?&gt;
&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
&lt;head&gt;
&lt;meta content="TX21_HTM 21.0.406.501" name="GENERATOR" /&gt;
&lt;title&gt;&lt;/title&gt;
&lt;/head&gt;
&lt;body style="font-family:'Arial';font-size:12pt;text-align:left;"&gt;
&lt;p lang="en-US" style="margin-top:0pt;margin-bottom:0pt;"&gt;&lt;span style="font-family:'Verdana';font-size:9pt;"&gt;ABC1.&lt;/span&gt;&lt;/p&gt;
&lt;p lang="en-US" style="margin-top:0pt;margin-bottom:0pt;"&gt;&lt;span style="font-family:'Verdana';font-size:9pt;"&gt;(ABC2)&lt;/span&gt;&lt;/p&gt;
&lt;p lang="en-US" style="margin-top:0pt;margin-bottom:0pt;"&gt;&lt;span style="font-family:'Verdana';font-size:9pt;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p lang="en-US" style="margin-top:0pt;margin-bottom:0pt;"&gt;&lt;span style="font-family:'Verdana';font-size:9pt;"&gt;ABC3&lt;/span&gt;&lt;/p&gt;

&lt;/body&gt;
&lt;/html&gt;

When add xhtml content without encoding, the output render without error on browser.

I has try replaced

$content = (file_get_contents("test_xmlencoding.xhtml")); 

to

$content = htmlentities(file_get_contents("test_xmlencoding.xhtml")); 

The output show only the ending tag of test element, </test>.


Solution

  • The second argument of DOMDocument::createElement() and the DOMNode::$nodeValue property have only a partial escaping. They expect special characters to be already escaped as entities - except < and >.

    $document = new DOMDocument();
    $document->appendChild(
      $tests = $document->createElement('tests')
    );
    $tests
      ->appendChild($document->createElement('test', 'a < b'));
    $tests
      ->appendChild($document->createElement('test', 'a & b'));
    echo $document->saveXML();
    

    Output:

    Warning: DOMDocument::createElement(): unterminated entity reference b in ... on line 9
    <?xml version="1.0"?>
    <tests><test>a &lt; b</test><test/></tests>
    

    The method argument is not part of the DOM standard and the property behaves different from the specification.

    In original DOM you where expected to add the content as a separate text node. This allows for mixed child nodes, too. Modern DOM introduced the DOMNode::$textContent property which acts as a shortcut.

    Here is an example:

    $xhtml = <<<'XHTML'
    <?xml version="1.0" ?>
    <html xmlns="http://www.w3.org/1999/xhtml">
      <body>
        <em>a &amp; b</em>
      </body>
    </html>
    XHTML;
    
    $document = new DOMDocument();
    $document->appendChild(
      $tests = $document->createElement('tests')
    );
    // append child element and set its text content
    $tests
      ->appendChild($document->createElement('test'))
      ->textContent = $xhtml;
    // append child element, then append child text node
    $tests
      ->appendChild($document->createElement('test'))
      ->appendChild($document->createTextNode($xhtml));  
      
    $document->formatOutput = true;
    echo $document->saveXML();
    

    Output: Take note of the double escaped &amp;amp;.

    <?xml version="1.0"?>
    <tests>
      <test>&lt;?xml version="1.0" ?&gt;
    &lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
      &lt;body&gt;
        &lt;em&gt;a &amp;amp; b&lt;/em&gt;
      &lt;/body&gt;
    &lt;/html&gt;</test>
      <test>&lt;?xml version="1.0" ?&gt;
    &lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
      &lt;body&gt;
        &lt;em&gt;a &amp;amp; b&lt;/em&gt;
      &lt;/body&gt;
    &lt;/html&gt;</test>
    </tests>