Search code examples
phpxmlxml-namespaces

Get certain values from XML feed with a colon in the node name


I can't seem to find a way to properly get some values from the following XML feed:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:newznab="http://www.newznab.com/DTD/2010/feeds/attributes/" encoding="utf-8">
 <channel>
  <atom:link href="https://REMOVED.com/api" rel="self" type="application/rss+xml"/>
  <title>REMOVED</title>
  <description>API Details</description>
  <link>https://REMOVED.com/</link>
  <language>en-gb</language>
  <webMaster>[email protected]</webMaster>
  <category>Stuff</category>
  <generator>Me</generator>
  <ttl>10</ttl>
  <docs>https://removed.com/apihelp/</docs>
  <image url="https://removed.com/themes/shared/img/logo.png" title="REMOVED" link="https://removed.com/" description="Visit REMOVED"/>
  <newznab:response offset="0" total="125000"/>
  <item>
   <title>Fair.Go.2017.09.18.HDTV.x264-FiHTV </title>
   <guid isPermaLink="true">https://REMOVED.com/details/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d</guid>
   <link>https://REMOVED.com/getnzb/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d.nzb&amp;i=1&amp;r=3bc4e94ef14337e4e2b490a3897c48f6</link>
   <comments>https://REMOVED.com/details/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d#comments</comments>
   <pubDate>Tue, 19 Sep 2017 10:18:21 +0200</pubDate>
   <category>TV &gt; SD</category>
   <description>Fair.Go.2017.09.18.HDTV.x264-FiHTV </description>
   <enclosure url="https://REMOVED.com/getnzb/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d.nzb&amp;i=1&amp;r=3bc4e94ef14337e4e2b490a3897c48f6" length="168013625" type="application/x-nzb"/>
   <newznab:attr name="category" value="5030"/>
   <newznab:attr name="size" value="168013625"/>
   <newznab:attr name="files" value="17"/>
   <newznab:attr name="poster" value="[email protected] (yeahsure)"/>
   <newznab:attr name="prematch" value="1"/>
   <newznab:attr name="info" value="https://REMOVED.com/api?t=info&amp;id=427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d&amp;r=3bc4e94ef14337e4e2b490a3897c48f6"/>
   <newznab:attr name="grabs" value="0"/>
   <newznab:attr name="comments" value="0"/>
   <newznab:attr name="password" value="0"/>
   <newznab:attr name="usenetdate" value="Tue, 19 Sep 2017 10:07:47 +0200"/>
   <newznab:attr name="group" value="alt.binaries.teevee"/>
  </item>
</channel>
</rss>

I need the value from and the values for size and usenetdate from those newznab:attr nodes and put them in an array. There's only 1 in here but in the real feed there are hundreds.

Can't be that hard in PHP right? Yet XMLWriter, DOM and SimpleXML all failed me. Or I failed them.

Any pointers?


Solution

  • The problem comes in when using namespaces, it's quite simple to deal with them in any XML system, here I've used SimpleXML. I've also assuming it's Channel that is repeated.

    To use namespaces, you need to register them with the XML system so that it knows how to associate them with the search, so here I use newznab as the prefix to attr. But this is what you see in the XML document, so it makes it easier to read. The XPath uses [@name='size'] to make it find the instance of attr which has this attribute/value combination - and then it returns the value attribute.

    $xml = simplexml_load_file('NewFile.xml');
    $xml->registerXPathNamespace("atom", "http://www.w3.org/2005/Atom");
    $xml->registerXPathNamespace("newznab", "http://www.newznab.com/DTD/2010/feeds/attributes/");
    
    foreach( $xml->channel as $channel ){
        echo "Channel title=".(string)$channel->title.PHP_EOL;
        echo "size=".(string)$channel->xpath("descendant::newznab:attr[@name='size']/@value")[0].PHP_EOL;
        echo "usenetdate=".(string)$channel->xpath("descendant::newznab:attr[@name='usenetdate']/@value")[0].PHP_EOL;
    }