Search code examples
phpxmlxmlreader

How can I remove all comments from a large XML file?


How can I remove all comments from a large XML file?

I have a large file XML and I want to thin it and I want to remove all the comments. The file has a size of over 200 mb and it takes a lot to parse the file and query something.

Code for parse is :

<?php

$dom    = new DOMDocument();
$xpath  = new DOMXPath($dom);
$reader = new XMLReader();
$reader->open('http://www.bookingassist.ro/test/HotelsPro.xml');

while ($reader->read()) {
    if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'Hotel') {
        $node = $dom->importNode($reader->expand(), true);
        $dom->appendChild($node);
        $result = $xpath->evaluate('string(self::Hotel[HotelCode = "'.$hotelCodes[3].'"]/HotelImages/ImageURL[1])', $node);
        $dom->removeChild($node);
        if ($result) {
            echo $result;

        }
    }
}
?>

Solution

  • Assuming Xslt is an option, you can use a modified version of the identity transform which will project nothing for any matched comment:

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" indent="yes"/>
    
      <xsl:template match="@* | node()">
        <xsl:copy>
          <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="comment()"/>
    
    </xsl:stylesheet>
    

    Fiddle here