Search code examples
phpxml

How to use simplexml_load_file to parse multiple feeds at once


Is it possible to extract the first item from a list of RSS feeds (ie from different sites) and display them together on one page, using PHP?

So far I'm using the code below, which works for a single feed, but I have no idea how to add more feeds and echo their first item. But I'd like to add more feeds besides verticalonline.

$rss_feed = simplexml_load_file("https://www.verticalonline.ro/feed");
if (!empty($rss_feed)) {
    $i = 0;
    
     foreach ($rss_feed->channel->item as $feed_item) {
        if ($i == 1) {
            break;
            } 
        else {
    echo " <item> \n";
    echo "  <title>".$feed_item->title."</title>\n";
    echo "  <link><![CDATA[ https://actualitate.org/stire-".preg_replace("(^https?://)", "", $feed_item->link )." ]]></link>\n";
    echo "  <category>".$feed_item->category."</category>\n";
    echo '<pubDate>'.$feed_item->pubDate.' GMT</pubDate>'."\r\n";     
    echo "  <description><![CDATA[ ".mb_strimwidth(trim(preg_replace(['/<[^>]*>/','/\s+/'],' ', $feed_item->description)), 0, 250, ' ...')." ]]></description>\n";        
if ($feed_item->enclosure)
echo '<enclosure url="'.$feed_item->enclosure->attributes()->url.'" type="image/jpeg" length="1967"/>'."\r\n";
else 
echo '<enclosure url="http://loremflickr.com/300/250/reporter?'.rand(0,25).'" type="image/jpeg" length="1967"/>'."\r\n"; 
     echo " </item>\n";
} 
        $i ++;
        }
}

I imagine I'll need an array, but this is above my noob php knowledge. Please help.

Using Lajos answer I've managed doing what I wanted, but I feel the script is taking too ling to execute. For only 4 feeds the execution time is almost 4 seconds. I'm trying to find a way to read each feed and maybe stop reading after the first item is pulled. Hopefully his will lower execution and resources.

This is what I have now:

$start = microtime(true);
$urls = [
    "feed 1",
    "feed 2",
    "feed 3",
    "feed 4",
];

foreach ($urls as $url) {
$rss_feed = simplexml_load_file($url);
if (!empty($rss_feed)) {
    $i = 0;
    
     foreach ($rss_feed->channel->item as $feed_item) {
        if ($i == 1) {
            break;
            } 
        else {
    echo " <item> \n";
    echo "  <title>".$feed_item->title."</title>\n";
    echo "  <category>".$feed_item->category."</category>\n";
    echo '<pubDate>'.$feed_item->pubDate.' GMT</pubDate>'."\r\n";     
    echo "  <description><![CDATA[ ".mb_strimwidth(trim(preg_replace(['/<[^>]*>/','/\s+/'],' ', $feed_item->description)), 0, 250, ' ...')." ]]></description>\n";        
if ($feed_item->enclosure)
echo '<enclosure url="'.$feed_item->enclosure->attributes()->url.'" type="image/jpeg" length="1967"/>'."\r\n";
else 
echo '<enclosure url="http://loremflickr.com/300/250/reporter?'.rand(0,25).'" type="image/jpeg" length="1967"/>'."\r\n"; 
     echo " </item>\n";
} 
        $i ++;
        }
}

}

$end = microtime(true);
$executionTime = $end - $start;
echo "Script execution time: " . $executionTime . " seconds";

Solution

  • Yes, you can create an array with your urls, loop it, get the rss feed, from it the first item and proceed doing what you planned:

    $urls = [
        "yoururl1",
        "yoururl2",
        "yoururl3",
        "yoururl4",
    ];
    
    foreach ($urls as $url) {
        $rss_feed = simplexml_load_file($url);
        /*
            Get your first item, maybe $rss_feed->channel->item[0], but you know that better
            and proceed doing what you would like to do with it
        */
    }
    

    EDIT

    Even though the solution from the initial answer was already working, it was not yet performant-enough. The suggestion described in the edit to the question was to get only one item. But this depends whether the RSS feed supports it and it's highly dependent on the API. In our general-purpose solution we cannot rely on an API's such support if it exists at all, because there is a collection of URLs to handle and we only know one of the items (a Romanian news page).

    So, let's see how we can improve the way we work in general. A problem we can easily realize is that we sequentially send n requests to different URLs and this means that if T(i) is the i's RSS request's time to be performed, then the current algorithm takes

    sum(T(i)), i = 1,n

    time. It would be great to have the page loaded and the items being filled nicely. So, I would suggest that the PHP request could respond with a container to be displayed with n different slots for the responses of different requests. The requests would be sent and parsed by Javascript. First, let's see a proof-of-concept, where we mock the requests, that is, we do not actually send the requests yet, just to have a concise example of how the UI is to be displayed:

    function handleItem(item, url) {
        //We simulate the time needed for request, we randomize it so that the
        //"articles" will not be responded to in order, just like in real-life
        let requestTime = parseInt(10000 * Math.random());
        //We simulate request sending with setTimeout, you will need an AJAX
        //request here
        setTimeout(function() {
            //You will need to parse the response of the request here in the callback
            //and use that instead of my dummy test data
            item.innerHTML = `${url} responded`;
            item.classList.add("completed");
        }, requestTime);
    }
    
    window.addEventListener("load", function() {
        for (let item of document.querySelectorAll("#container > div")) {
            handleItem(item, item.getAttribute("data-url"));
        }
    });
    #container > div {
        background-color: red;
        border: 1px solid black;
        width: 100%;
        height: 40px;
    }
    
    #container > div.completed {
        background-color: green;
    }
    <div id="container">
        <div data-url="url1"></div>
        <div data-url="url2"></div>
        <div data-url="url3"></div>
        <div data-url="url4"></div>
        <div data-url="url5"></div>
        <div data-url="url6"></div>
        <div data-url="url7"></div>
        <div data-url="url8"></div>
        <div data-url="url9"></div>
        <div data-url="url10"></div>
    </div>

    As about sending AJAX request from your browser to an RSS feed and then parsing it, you can read the guidelines here: https://css-tricks.com/how-to-fetch-and-parse-rss-feeds-in-javascript/

    Apparently,

    new window.DOMParser().parseFromString(str, "text/xml")
    

    should do the trick as long as str is the response of the feed and it is well-formed, but I did not test that one.