Search code examples
phpxmlxpath

How to get all nodes in XML?


I want to get all <image_link> of an XML. I have tried like this:

$reader = XMLReader::open($path);
$document = new DOMDocument();
$xpath = new DOMXpath($document);

while (
$reader->read() && $reader->localName !== 'item'
) {
 continue;
}
 while ($reader->localName === 'item') {
                // expand into DOM
                $item = $reader->expand($document);

                $query = 'count(g:additional_image_link)';
                $entries = $xpath->evaluate($query, $item);
                echo "There are $entries additional_image_links\n";

                $nodes = $xpath->evaluate('string(g:additional_image_link)', $item);
                echo "\n<pre>nodes";
                var_dump($nodes);
                echo '</pre>';

The amount of $entries is given correctly. But in the nodes is just the first entry:

<pre>nodesstring(18) "https://image1.jpg"
</pre>array(0) {

How to get all images in an array? The XML looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
<item>
  <title>A</title>
  <g:additional_image_link>https://image1.jpg</g:additional_image_link>
  <g:additional_image_link>https://image2.jpg</g:additional_image_link>
</item>
<item>
  <title>B</title>
  <g:additional_image_link>https://image3.jpg</g:additional_image_link>
  <g:additional_image_link>https://image4.jpg</g:additional_image_link>
  <g:additional_image_link>https://image4.jpg</g:additional_image_link>
</item>
    </channel>
</rss>

Solution

  • Your reader (whichever you use) may fail.

    But you can load as XML with DomDocument which has great error handling and just receive all additional_image_link tags.

    Demo: https://3v4l.org/17F8a

    $xml = <<<'_XML'
    <?xml version="1.0" encoding="UTF-8" ?>
    <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0" xmlns:atom="http://www.w3.org/2005/Atom">
        <channel>
    <item>
      <title>A</title>
      <g:additional_image_link>https://image1.jpg</g:additional_image_link>
      <g:additional_image_link>https://image2.jpg</g:additional_image_link>
    </item>
    <item>
      <title>B</title>
      <g:additional_image_link>https://image3.jpg</g:additional_image_link>
      <g:additional_image_link>https://image4.jpg</g:additional_image_link>
      <g:additional_image_link>https://image4.jpg</g:additional_image_link>
    </item>
        </channel>
    </rss>
    _XML;
    
    libxml_use_internal_errors(true);
    $dom = new DOMDocument;
    $dom->loadXML($xml);
    foreach ($dom->getElementsByTagName('additional_image_link') as $link) {
        echo $link->textContent, PHP_EOL;
    }
    

    Output

    https://image1.jpg
    https://image2.jpg
    https://image3.jpg
    https://image4.jpg
    https://image4.jpg