Search code examples
phpapirssrss-reader

php reading RSS feed cannot read <a10:content type="text/xml"> tag


I'm trying to read an RSS feed using php. For some reason it cannot read this content tag.

<a10:content type="text/xml">...</a10:content>

This is an example of what an item could look like

<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
    <channel>
        <title>mMin title</title>
        <description>Some description</description>
        <managingEditor>[email protected]</managingEditor>
        <category>Some category</category>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com/1</link>
            <title>Some title 1</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>San diego</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com/2</link>
            <title>Some title 2</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>Detroit</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com/3</link>
            <title>Some title 3</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>Los Angeles</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
    </channel>
</rss>

Here is my code.

    $url = "http://example.com/RSSFeed";
    $xml = simplexml_load_file($url);

    foreach ($xml->channel as $x) {
        foreach ($x->item as $item) {

            dd($item);
        }
    }

Which outputs

    SimpleXMLElement {#111 ▼
      +"guid": "1"
      +"link": "https://example.com"
      +"title": "Some title"
    }

Here is my expected output

SimpleXMLElement {#111 ▼
  +"guid": "1"
  +"link": "https://example.com"
  +"title": "Some title"
  +"content" {
    0 => {
        +"Location": "San Diego"
        +"PublishedOn": "2016-10-21T11:21:07"
        +"Body": "Lorem ipsum dolar"
        +"JobCountry": "USA"
    }
    1 => {
        +"Location": "Detroit"
        +"PublishedOn": "2016-10-21T11:21:07"
        +"Body": "Lorem ipsum dolar"
        +"JobCountry": "USA"
    }
    2 => {
        +"Location": "Los Angeles"
        +"PublishedOn": "2016-10-21T11:21:07"
        +"Body": "Lorem ipsum dolar"
        +"JobCountry": "USA"
    }
  }
}

Anyone has a solution for this?


Solution

  • You should use namespace for accessing. Here we are using DOMDocument to achieve desired output. DOMDocument function getElementsByTagNameNS, in this we pass namespace uri and its required content. so that expected output can be achieved.

    If you prefer to use simplexml_load_string you can check this out. PHP code demo

    Try this code snippet here

    <?php
    
    ini_set('display_errors', 1);
    
    libxml_use_internal_errors(true);   
    $string=<<<HTML
    <rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
        <channel>
            <title>mMin title</title>
            <description>Some description</description>
            <managingEditor>[email protected]</managingEditor>
            <category>Some category</category>
            <item>
                <guid isPermaLink="false">1</guid>
                <link>https://example.com</link>
                <title>Some title</title>
                <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
                <a10:content type="text/xml">
                    <Location>Detroit</Location>
                    <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                    <Body>Lorem ipsum dolar</Body>
                    <JobCountry>USA</JobCountry>
                </a10:content>
            </item>
        </channel>
    </rss>
    HTML;
    $data=array();
    $completeData=array();
    $domDocument = new DOMDocument();
    $domDocument->loadXML($string);
    $results=$domDocument->getElementsByTagNameNS("http://www.w3.org/2005/Atom", "content");
    foreach($results as $result)
    {
        if($result instanceof DOMElement && $result->tagName=="a10:content")
        {
            foreach($result->childNodes as $node)
            {
                if($node instanceof DOMElement)
                {
                    $data[]=$node->nodeValue;
                }
            }
        }
        $completeData[]=$data;
    }
    print_r($completeData);