I'm trying to retrieve one specific node based on the <id>
element from a huge XML file. I have used DOMDocument, but its not ideal since it loads the whole document first. There is around 1400 <item>
nodes in the document. This is a simplified version of the document:
<main>
<body>
...
<sub>
...
<items>
...
<item>
<name>Abc</name>
...
<id>123</id>
<calls>
<call>
<name>Monkey</name>
<text>Monkeys r cool</text>
...
</call>
<call>
<name>Pig</name>
<text>Pigs too!</text>
...
</call>
</calls>
<cones>
<cone>
<name>Lorem</name>
<text>Lorem ipsum</text>
...
</cone>
<cone>
<name>More</name>
<text>Placeholder</text>
...
</cone>
</cones>
<a>true</a>
</item>
<item>
<name>Def</name>
...
<id>456</id>
<calls>
<call>
<name>aa</name>
<text>aa</text>
...
</call>
<call>
<name>bb</name>
<text>bb</text>
...
</call>
</calls>
<cones>
<cone>
<name>cc</name>
<text>cc</text>
...
</cone>
<cone>
<name>dd</name>
<text>dd</text>
...
</cone>
</cones>
<a>true</a>
</item>
</items>
</sub>
</body>
</main>
So basically I'm trying to retrieve the current node and its children's data from matching the <id>
element. I have tried find tutorials on XMLReader, but can't seem to find that much. This is what I've tried so far:
$xml = new XMLReader();
$xml->open('doc.xml');
while($xml->read()) {
if($xml->nodeType == XMLREADER::ELEMENT && $xml->localName == 'id') {
$xml->read();
echo $xml->value;
}
}
This finds every <id>
element, but i want to find one specific and read the data from the current node, and its children. Maybe using the example to find the node and readInnerXml()
to get the data
I'm not an expert so any help / push to the right direction is much appreciated :D
If all the item
elements are siblings you can use XMLReader::read()
to find the first element and XMLReader::next()
to iterate them.
Then use XMLReader::expand()
to load the item
and its descendants into DOM, use Xpath to read data from it.
$searchForID = '123';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// look for the first "item" element node
while (
$reader->read() && $reader->localName !== 'item'
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'item') {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "id" with the searched contents
if ($xpath->evaluate("count(self::*[id = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(name)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(name)', $call),
'text' => $xpath->evaluate('string(text)', $call)
];
},
iterator_to_array(
$xpath->evaluate('calls/call', $item)
)
)
]
);
}
$reader->next('item');
}
$reader->close();
If the XML uses a namespace (like the one you linked in the comments) you will have to takes it into consideration.
For the XMLReader that means validating not just localName
(the node name without any namespace prefix/alias) but the namespaceURI
as well.
For DOM methods that would mean using the namespace aware methods (with the suffix NS) and registering your own alias/prefix for the Xpath expressions.
$searchForID = '2755';
$reader = new XMLReader();
$reader->open('data:text/xml;base64,'.base64_encode(getXMLString()));
// the namespace uri
$xmlns_siri = 'http://www.siri.org.uk/siri';
$document = new DOMDocument();
$xpath = new DOMXpath($document);
// register an alias for the siri namespace
$xpath->registerNamespace('siri', $xmlns_siri);
// look for the first "item" element node
while (
$reader->read() &&
(
$reader->localName !== 'EstimatedVehicleJourney' ||
$reader->namespaceURI !== $xmlns_siri
)
) {
continue;
}
// iterate "item" sibling elements
while ($reader->localName === 'EstimatedVehicleJourney') {
// validate the namespace of the node
if ($reader->namespaceURI === $xmlns_siri) {
// expand into DOM
$item = $reader->expand($document);
// if the node has a child "VehicleRef" with the searched contents
// note the use of the registered namespace alias
if ($xpath->evaluate("count(self::*[siri:VehicleRef = '$searchForID']) > 0", $item)) {
var_dump(
[
// fetch node text content as string
'name' => $xpath->evaluate('string(siri:OriginName)', $item),
// fetch list of "call" elements and map them
'calls' => array_map(
function(DOMElement $call) use ($xpath) {
return [
'name' => $xpath->evaluate('string(siri:StopPointName)', $call),
'reference' => $xpath->evaluate('string(siri:StopPointRef)', $call)
];
},
iterator_to_array(
$xpath->evaluate('siri:RecordedCalls/siri:RecordedCall', $item)
)
)
]
);
}
}
$reader->next('EstimatedVehicleJourney');
}
$reader->close();