Search code examples
phpxmlxmlreader

Use XMLReader to find node and retrieve XML from current node and following children


I'm trying to retrieve one specific node based on the <id> element from a huge XML file. I have used DOMDocument, but its not ideal since it loads the whole document first. There is around 1400 <item> nodes in the document. This is a simplified version of the document:

<main>
  <body>
    ...
    <sub>
      ...
      <items>
        ...
        <item>
          <name>Abc</name>
          ...
          <id>123</id>
            <calls>
              <call>
                <name>Monkey</name>
                <text>Monkeys r cool</text>
                ...
              </call>
              <call>
                <name>Pig</name>
                <text>Pigs too!</text>
                ...
              </call>
            </calls>
            <cones>
              <cone>
                <name>Lorem</name>
                <text>Lorem ipsum</text>
                ...
              </cone>
              <cone>
                <name>More</name>
                <text>Placeholder</text>
                ...
              </cone>
            </cones>
          <a>true</a>
        </item>
        <item>
          <name>Def</name>
          ...
          <id>456</id>
            <calls>
              <call>
                <name>aa</name>
                <text>aa</text>
                ...
              </call>
              <call>
                <name>bb</name>
                <text>bb</text>
                ...
              </call>
            </calls>
            <cones>
              <cone>
                <name>cc</name>
                <text>cc</text>
                ...
              </cone>
              <cone>
                <name>dd</name>
                <text>dd</text>
                ...
              </cone>
            </cones>
          <a>true</a>
        </item>
      </items>
    </sub>
  </body>
</main>

So basically I'm trying to retrieve the current node and its children's data from matching the <id> element. I have tried find tutorials on XMLReader, but can't seem to find that much. This is what I've tried so far:

$xml = new XMLReader();
$xml->open('doc.xml');

while($xml->read()) {
    if($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == 'id') {
        $xml->read();
        echo $xml->value;
  }
}

This finds every <id> element, but i want to find one specific and read the data from the current node, and its children. Maybe using the example to find the node and readInnerXml() to get the data

I'm not an expert so any help / push to the right direction is much appreciated :D


Solution

  • If all the item elements are siblings you can use XMLReader::read() to find the first element and XMLReader::next() to iterate them.

    Then use XMLReader::expand() to load the item and its descendants into DOM, use Xpath to read data from it.

    $searchForID = '123';
    
    $reader = new XMLReader();
    $reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
    
    $document = new DOMDocument();
    $xpath = new DOMXpath($document);
    
    // look for the first "item" element node
    while (
      $reader->read() && $reader->localName !== 'item'
    ) {
      continue;
    }
    
    // iterate "item" sibling elements
    while ($reader->localName === 'item') {
      // expand into DOM
      $item = $reader->expand($document);
      // if the node has a child "id" with the searched contents
      if ($xpath->evaluate("count(self::*[id = '$searchForID']) > 0", $item)) {
        var_dump(
          [
            // fetch node text content as string  
            'name' => $xpath->evaluate('string(name)', $item),
            // fetch list of "call" elements and map them
            'calls' => array_map(
              function(DOMElement $call) use ($xpath) {
                return [
                  'name' => $xpath->evaluate('string(name)', $call),
                  'text' => $xpath->evaluate('string(text)', $call)
                ];
              },
              iterator_to_array(
                $xpath->evaluate('calls/call', $item)
              )
            )
          ] 
        );
      }
      $reader->next('item');
    }
    $reader->close();
    

    XML with namespaces

    If the XML uses a namespace (like the one you linked in the comments) you will have to takes it into consideration.

    For the XMLReader that means validating not just localName (the node name without any namespace prefix/alias) but the namespaceURI as well.

    For DOM methods that would mean using the namespace aware methods (with the suffix NS) and registering your own alias/prefix for the Xpath expressions.

    $searchForID = '2755';
    
    $reader = new XMLReader();
    $reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
    
    // the namespace uri
    $xmlns_siri = 'http://www.siri.org.uk/siri';
    
    $document = new DOMDocument();
    $xpath = new DOMXpath($document);
    // register an alias for the siri namespace 
    $xpath->registerNamespace('siri', $xmlns_siri);
    
    // look for the first "item" element node
    while (
      $reader->read() && 
      (
        $reader->localName !== 'EstimatedVehicleJourney' ||
        $reader->namespaceURI !== $xmlns_siri
      )
    ) {
      continue;
    }
    
    // iterate "item" sibling elements
    while ($reader->localName === 'EstimatedVehicleJourney') {
        // validate the namespace of the node
      if ($reader->namespaceURI === $xmlns_siri) {
        // expand into DOM
        $item = $reader->expand($document);
        // if the node has a child "VehicleRef" with the searched contents
        // note the use of the registered namespace alias
        if ($xpath->evaluate("count(self::*[siri:VehicleRef = '$searchForID']) > 0", $item)) {
          var_dump(
            [
              // fetch node text content as string  
              'name' => $xpath->evaluate('string(siri:OriginName)', $item),
              // fetch list of "call" elements and map them
              'calls' => array_map(
                function(DOMElement $call) use ($xpath) {
                  return [
                    'name' => $xpath->evaluate('string(siri:StopPointName)', $call),
                    'reference' => $xpath->evaluate('string(siri:StopPointRef)', $call)
                  ];
                },
                iterator_to_array(
                  $xpath->evaluate('siri:RecordedCalls/siri:RecordedCall', $item)
                )
              )
            ] 
          );
        }
      }
      $reader->next('EstimatedVehicleJourney');
    }
    $reader->close();