Search code examples
phpcurlfile-get-contentsrss-reader

PHP file_get_contents is slow and return 500 Internal Server Error


I'm trying to read rss of a news agency site and get several options of all news to save in my database. so i used php functions as file_get_contents or cURl but it takes about a minute to get content of site and analyze it for seperating that parts of news I want.

This is a part of my code that I get datails of news from rss:

$rss = new DOMDocument();
$rss->load('http://isna.ir/fa/Sports/feed');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
    $item = array ( 
        'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
        'category' => $node->getElementsByTagName('category')->item(0)->nodeValue,
        'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
        'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
        );
    array_push($feed, $item);
}
$title = str_replace(' & ', ' & ', $feed[0]['title']);
    $link = $feed[0]['link'];
    $category = $feed[0]['category'];
    $date = date('l F d, Y', strtotime($feed[0]['date']));

And in this part I use link of news to get a photo from original news page :

$context = stream_context_create(array('http' => array('header'=>'Connection: close')));

$f = explode("news", $link);
$photo_link = $f[0]. 'photo' .$f[1];

$ff = file_get_contents($photo_link, false, $context);
$f1 = explode('<div class="news-image">', $ff);
$f2 = explode('<h1', $f1[1]);
$f3 = explode('href="', $f2[0]);
$f4 = explode('">', $f3[1]);
$image = $f4[0];

echo '<img src="' .$image. '"></img>';

And this is the result most of the times:

Warning: file_get_contents(http://isna.ir/fa/photo/92040301515/مدافع-تیم-ملی-آلمان-از-فوتبال-خداحافظی-کرد) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error in /opt/lampp/htdocs/example8/reader.php

I used the cURL functions too, but not much better result obtained!


Solution

  • try this

    <?php
      $photo_link = explode( "news", $link );
    
      $first  = $photo_link[0];
      $last   = str_replace( basename( $photo_link[1] ), urlencode( basename( $photo_link[1] ) ), $photo_link[1] );
    
      $photo_link = $first."news".$last;
      print_r( file_get_contents( $photo_link, false, $context ) );
    ?>
    

    so your full code will be something like this

    <?php
      $feed = array();
      $rss  = new DOMDocument();
      $rss->load( 'http://isna.ir/fa/Sports/feed' );
    
      foreach( $rss->getElementsByTagName( 'item' ) as $node ) {
        $feed[] = array(
          'title'     =>  str_replace( " & ", " &amp; ", $node->getElementsByTagName( 'title' )->item(0)->nodeValue ),
          'category'  =>  $node->getElementsByTagName( 'category' )->item(0)->nodeValue,
          'link'      =>  $node->getElementsByTagName( 'link' )->item(0)->nodeValue,
          'date'      =>  strtotime( $node->getElementsByTagName( 'pubDate' )->item(0)->nodeValue )
        );
      }
    
      $title    = $feed[0]["title"];
      $link     = $feed[0]["link"];
      $category = $feed[0]["category"];
      $date     = date( "l F d, Y", $feed[0]["date"] );
    
      print_r( $feed );
    
      $context  = stream_context_create(
        array(
          'http'  =>  array(
            'header'  =>  'Connection: close'
          )
        )
      );
    
      $f  = explode( "news", $link );
    
      /** My Code Starts **/
      $f[1] = str_replace( basename( $f[1] ), urlencode( basename( $f[1] ) ), $f[1] );
      /** My Code Ends **/
    
      $photo_link = $f[0]."photo".$f[1];
    
      $ff = file_get_contents( $photo_link, false, $context );
      $f1 = explode( '<div class="news-image">', $ff );
      $f2 = explode( '<h1', $f1[1] );
      $f3 = explode( 'href="', $f2[0] );
      $f4 = explode( '">', $f3[1] );
      $image  = $f4[0];
    
      echo '<img src="'.$image.'"></img>';
    ?>