Search code examples
phpxml-parsingexpat-parser

Parsing only part of a document with expat in PHP


I am building a website in PHP in which the content is stored in an XML file. Basically I have a single index.php page, which checks the querystring and serves the appropriate page from the XML.

For example, entering www.mysite.com/?page=home would cause the PHP script to check the XML file for a <page id="home"> tag and paste whatever is inside that tag into index.php.

The contents of <page> tags are stored as HTML, thus:

<xmlroot>
  <page id="home">
    <h1>An HTML Header Tag!</h1>
    <p>This is a paragraph</p>
  </page>
  [...etc]
</xmlroot>

I was hoping to be able to grab the appropriate <page> tag and somehow parse the contents. I know that everything in the <page> tag is valid HTML, so I was just going to use expat to run through the tags and echo them straight back out.

So I am using the DOMDocument method to find the correct <page>, which works fine, except that the contents are returned as a DOM element. The expat parser requires a string. So I need to do one of two things:

  1. Magically convert the DOM element to a string that keeps all the tags intact so I can use it in the expat parser. However, if I could do this I wouldn't need the expat parser, I could just echo that converted string straight out....

  2. Use something other than expat.

Incidentally, I know I could just replace the < and > in the XML with &lt; and &gt;, but this makes the code quite hard to read and edit. I'd like to avoid it if possible.


Solution

  • <?php
    
    $doc = new DOMDocument('1.0');
    
    $root = $doc->createElement('html');
    $root = $doc->appendChild($root);
    
    $head = $doc->createElement('head');
    $head = $root->appendChild($head);
    
    $title = $doc->createElement('title');
    $title = $head->appendChild($title);
    
    $text = $doc->createTextNode('< This is the title >');
    $text = $title->appendChild($text);
    
    echo $head->ownerDocument->saveXML($head);
    

    DOMDocument::saveXML() takes $node parameter to output only a specific node


    http://www.php.net/manual/en/domdocument.savexml.php