Search code examples
phpperformancerssfeed

PHP RSS feed reader efficiency


I'm reading data from an XML feed as follows:

$data=file_get_contents("mydata.rss");

$data=simplexml_load_string($data);

foreach($data->channel->item as $item){ 

     $articles[] = array(
                    'description' => (string)$item->description,
                    'link' => (string)$item->link,
                    'pubDate' => (string)$item->pubDate,);

} 

The issue is that the feed is very long with maybe 100 items. I only want read the first 10. I can work around this by manually setting a counter and then using an if statement within the foreach loop but I don't think that is the best approach as the entire feed is still be read and therefore unecessary overhead is added.

what's the most efficient way of achieving this without reading the entire feed?

Thanks in advance...


Solution

  • Using SimpleXML, as you say, you load all the file in memory and then it's parsed. Then you iterate over the loaded elements in memory.

    Using a SAX-like parser like "XML Parser", will allow you don't read the full file. I don't know how exactly is implemented, but the aproach in SAX is fire an event every time a new element is detected. Then, you can start reading the RSS and stop parsing when the 10th element of type "item" is closed.

    This aproach has smaller memory footprint and is faster. In the other hand, it's not as easy iterate over the elements of the XML.