Search code examples
phpscreen-scraping

Scraping a link in PHP not working, but link ok in browser


I am trying to scrape the contents of this page using PHP.

Link works in a browser, but when using curl or get_file_contents, the booking.com website reports that the link is not valid. I'm not sure if this is a firewall problem with my hosting company reg-123?

Can anyone help please?

Code being used is as follows:

$url='https://secure-admin.booking.com/booking.html?bn=600861417&hotel_id=279299&l ang=en&code=049ae718b3d22164934cf621bece92ad&message_num=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US)'); 

$result = curl_exec($ch);
echo $result;

Solution

  • It's not get_file_contents, but file_get_contents: And it just returns the contents perfectly! I tried it. Also I noticed in your URL there is an unwanted white space, just after 279299279299&l ang

    <?php
    $contents = file_get_contents("https://secure-admin.booking.com/booking.html?bn=600861417&hotel_id=279299&lang=en&code=049ae718b3d22164934cf621bece92ad&message_num=1");
    
    echo $contents;
    ?>