Search code examples
phpxmlcdataxenforo

SimpleXML can't get CDATA with ns prefixes


I've been struggling to get CDATA from an xml file for the past few hours, even though I've tried different methods shown here, here, and here.

My dilemma has to do with retrieving thread data through xenForo's RSS feeds. Here is a sample of the RSS data I'm trying to retrieve, everything works fine except for retrieving the <content:encoded>.

Sample file:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>News &amp; Announcements</title>
    <description>All of our important news and announcements will be here.</description>
    <pubDate>Fri, 26 Jun 2015 14:54:20 +0000</pubDate>
    <lastBuildDate>Fri, 26 Jun 2015 14:54:20 +0000</lastBuildDate>
    <generator>********* ****</generator>
    <link>https://***.****.****/forum/news/</link>
    <atom:link rel="self" type="application/rss+xml" href="https://***.****.****/forum/news/index.rss"/>
    <item>
      <title>Site under development.</title>
      <pubDate>Thu, 25 Jun 2015 05:49:43 +0000</pubDate>
      <link>https://***.****.****/threads/site-under-development.3/</link>
      <guid>https://***.****.****/threads/site-under-development.3/</guid>
      <author>[email protected] (*****)</author>
      <dc:creator>ShortCut Central</dc:creator>
      <content:encoded><![CDATA[Content to retrieve. <br /> Some more content a part of the same section]]></content:encoded>
    </item>
  </channel>
</rss>

My current code looks like

<?php
class SCC_Main_miscFuncs {
    public static function printMostRecentPost() {
        // Re-enable the below once we're ready to release
        //$rssUrl = func_get_arg(1);
        $rssUrl = 'https://www.shortcutcentral.org/indev.rss';
        $xml = simplexml_load_string(self::returnContents($rssUrl));
        $rawData = self::returnContents($rssUrl); // Properly contains the CDATA
        echo '<pre>';
        //echo (string) $xml->channel->item->encoded;
        //echo (string) $xml->channel->item->content;
        //var_dump($xml);
        echo '</pre>';
        //echo (string) $xml->channel->item;
        //echo $array[@attributes]['item']['link'];
        //echo $xml->message;
    }

    public static function returnContents($url){
        $curl_handle=curl_init();
        curl_setopt($curl_handle, CURLOPT_URL,$url);
        curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
        curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl_handle, CURLOPT_USERAGENT, 'ShortCut Central');
        $query = curl_exec($curl_handle);
        curl_close($curl_handle);
        return $query;
    }
} 

Nothing seems to show the said CDATA except for the unparsed $rawData. I feel it might be because I'm not calling it properly (being completely new to XML and namespaces and namespace prefixes), but it not showing up through var_dump is giving me... hell. I saw some earlier posts about using XML children, but I don't entirely understand that concept, which is why, if my solution requires XML children, an explanation would be greatly appreciated.

Thank you!

Also might be worth mentioning that my php code is organized in the way that it is (classes and public, static functions) so that I can use it as an add-on for xenForo.


Solution

  • You are correct that one method to return the namespaced node in SimpleXML is to use SimpleXMLElement::children() but you must pass the namespace as its first argument. You may pass the full namespace string "http://purl.org/rss/1.0/modules/content/", but it is easier to pass its prefix "content", and also then supply TRUE as the second argument to inform children() that you are passing a prefix rather than the full string.

    So using an expression on your $xml object like:

    echo (string)$xml->channel->item->children('content', TRUE)->encoded;
    // Prints:
    // Content to retrieve. <br /> Some more content a part of the same section
    

    Use whatever method makes the most sense in context of your code to retrieve all the relevant nodes in a loop.

    Retrieving attributes from namespaced nodes isn't much different. To get the <atom:link href> for example:

    echo (string)$xml->channel->children('atom', true)->link->attributes()['href'];
    // Prints
    // https://***.****.****/forum/news/index.rss