Search code examples
phpxmlparsingxmlreaderxml-namespaces

How to read an XML file with an undefined namespace with XMLReader?


I'm relatively new to parsing XML files and am attempting to read a large XML file with XMLReader.

<?xml version="1.0" encoding="UTF-8"?>
<ShowVehicleRemarketing environment="Production" lang="en-CA" release="8.1-Lite" xsi:schemaLocation="http://www.starstandards.org/STAR /STAR/Rev4.2.4/BODs/Standalone/ShowVehicleRemarketing.xsd">
  <ApplicationArea>
    <Sender>
      <Component>Component</Component>
      <Task>Task</Task>
      <ReferenceId>w5/cron</ReferenceId>
      <CreatorNameCode>CreatorNameCode</CreatorNameCode>
      <SenderNameCode>SenderNameCode</SenderNameCode>
      <SenderURI>http://www.example.com</SenderURI>
      <Language>en-CA</Language>
      <ServiceId>ServiceId</ServiceId>
    </Sender>
    <CreationDateTime>CreationDateTime</CreationDateTime>
    <Destination>
      <DestinationNameCode>example</DestinationNameCode>
    </Destination>
  </ApplicationArea>
...

I am recieving the following error

ErrorException [ Warning ]: XMLReader::read() [xmlreader.read]: compress.zlib://D:/WebDev/example/local/public/../upload/example.xml.gz:2: namespace error : Namespace prefix xsi for schemaLocation on ShowVehicleRemarketing is not defined

I've searched around and can't find much useful information on using XMLReader to read XML files with namespaces -- How would I go about defining a namespace, if that is in fact what I need to do.. little help? links to pertinent resources?


Solution

  • There needs to be a definition of the xsi namespace. E.g.

    <ShowVehicleRemarketing
      environment="Production"
      lang="en-CA"
      release="8.1-Lite"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.starstandards.org/STAR/STAR/Rev4.2.4/BODs/Standalone/ShowVehicleRemarketing.xsd"
    >
    

    Update: You could write a user defined filter and then let the XMLReader use that filter, something like:

    stream_filter_register('darn', 'DarnFilter');
    $src = 'php://filter/read=darn/resource=compress.zlib://something.xml.gz';
    $reader->open($src);
    

    The contents read by the compress.zlib wrapper is then "routed" through the DarnFilter which has to find the (first) location where it can insert the xmlns:xsi declaration. But this is quite messy and will take some afford to do it right (e.g. theoretically bucket A could contain xs, bucket B i:schem and bucket C aLocation=")


    Update 2: here's an ad-hoc example of a filter in php that inserts the xsi namespace declaration. Mostly untested (worked with the one test I ran ;-) ) and undocumented. Take it as a proof-of-concept not production-code.

    <?php
    stream_filter_register('darn', 'DarnFilter');
    $src = 'php://filter/read=darn/resource=compress.zlib://d:/test.xml.gz';
    
    $r = new XMLReader;
    $r->open($src);
    while($r->read()) {
      echo '.';
    }
    
    class DarnFilter extends php_user_filter {
      protected $buffer='';
      protected $status = PSFS_FEED_ME;
    
      public function filter($in, $out, &$consumed, $closing)
      {
        while ( $bucket = stream_bucket_make_writeable($in) ) {
          $consumed += $bucket->datalen;
          if ( PSFS_PASS_ON == $this->status ) {
            // we're already done, just copy the content
            stream_bucket_append($out, $bucket);
          }
          else {
            $this->buffer .= $bucket->data;
            if ( $this->foo() ) {
              // first element found
              // send the current buffer          
              $bucket->data = $this->buffer;
              $bucket->datalen = strlen($bucket->data);
              stream_bucket_append($out, $bucket);
              $this->buffer = null;
              // no need for further processing
              $this->status = PSFS_PASS_ON;
            }
          }
        }
        return $this->status;
      }
    
      /* looks for the first (root) element in $this->buffer
      *  if it doesn't contain a xsi namespace decl inserts it
      */
      protected function foo() {
        $rc = false;
        if ( preg_match('!<([^?>\s]+)\s?([^>]*)>!', $this->buffer, $m, PREG_OFFSET_CAPTURE) ) {
          $rc = true;
          if ( false===strpos($m[2][0], 'xmlns:xsi') ) {
            echo ' inserting xsi decl ';
            $in = '<'.$m[1][0]
              . ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '
              . $m[2][0] . '>';    
            $this->buffer = substr($this->buffer, 0, $m[0][1])
              . $in
              . substr($this->buffer, $m[0][1] + strlen($m[0][0]));
          }
        }
        return $rc;
      }
    }
    

    Update 3: And here's an ad-hoc solution written in C#

    XmlNamespaceManager nsmgr = new XmlNamespaceManager(new NameTable());
    // prime the XMLReader with the xsi namespace
    nsmgr.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
    
    using ( XmlReader reader = XmlTextReader.Create(
      new GZipStream(new FileStream(@"\test.xml.gz", FileMode.Open, FileAccess.Read), CompressionMode.Decompress),
      new XmlReaderSettings(),
      new XmlParserContext(null, nsmgr, null, XmlSpace.None)
    )) {
      while (reader.Read())
      {
        System.Console.Write('.');
      }
    }