Search code examples
phpfile-get-contents

Is there a way to get round a 403 error with php file_get_contents?


I'm trying to get a specific webpage using php file_get_contents - when I view the page directly there is no problem but when trying to grab it using php I get "failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden". Theres a piece of data that I'm trying to extract from the page.

$ft = file_get_contents('https://www.vesselfinder.com/vessels/CELEBRITY-MILLENNIUM-IMO-9189419-MMSI-249055000');

echo $ft;

I've read up on various pages here about using stream_context_create, mainly the user agent part

$context  = stream_context_create(
array(
    "http" => array(
        "header" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"
    )
)

);

But nothing works and I now get a 400 error message. Unfortunately it doesn't look like my server is configured to use cURL so file_get_contents seems to be the only way for me to do this.


Solution

  • You need to add the User-Agent header to the actual header:

    $context  = stream_context_create(
      array(
        'http' => array(
          'header' => 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
        ),
    ));
    

    You could also use the user_agent option:

    $context = stream_context_create(
      array(
        'http' => array(
          'user_agent' => 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
        ),
    ));
    

    Both above examples should work and you should now be able to get the contents using:

    $content = file_get_contents('https://www.vesselfinder.com/vessels/CELEBRITY-MILLENNIUM-IMO-9189419-MMSI-249055000', false, $context);
    
    echo $content;
    

    This could of course also be tested using curl from the command line. Notice that we are setting our own User-Agent header:

    curl --verbose -H 'User-Agent: YourApplication/1.0' 'https://www.vesselfinder.com/vessels/CELEBRITY-MILLENNIUM-IMO-9189419-MMSI-249055000'
    

    It might also be worth knowing that the default User-Agent used by curl seems to be blocked, so if using curl you need to add your own using the -H flag.