I'm trying to do web scraping on betfair.com site with this code php:
<?php
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
$scraped_website = curl("https://www.betfair.com/exchange/football");
echo $scraped_website;
?>
the code in this way works.
But if instead of "https://www.betfair.com/exchange/football" choose "https://www.betfair.com/exchange/football/event?id=28040884" the code stops working.
Help please.
look at the headers curl receives:
HTTP/1.1 302 Moved Temporarily
Location: https://www.betfair.com/exchange/plus/#/football/event/28040884
Cache-Control: no-cache
Pragma: no-cache
Date: Fri, 09 Dec 2016 17:38:52 GMT
Age: 0
Transfer-Encoding: chunked
Connection: keep-alive
Server: ATS/5.2.1
Set-Cookie: vid=00956994-084c-444b-ad26-38b1119f4e38; Domain=.betfair.com; Expires=Mon, 01-Dec-2022 09:00:00 GMT; Path=/
X-Opaque-UUID: 80506a77-12c1-4c89-b4a6-fa499fd23895
actually https://www.betfair.com/exchange/football/event?id=28040884 send a 302 Moved Temporarily HTTP redirect, and your script does not follow redirects, that's why it's not working. fix that (using CURLOPT_FOLLOWLOCATION), and your code works fine. fixed code:
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
var_dump(curl("https://www.betfair.com/exchange/football/event?id=28040884"));
(i would also recommend using CURLOPT_ENCODING=>'' , that will make curl use compressed transfer if supported, and HTML compresses really, really good using gzip which curl is usually compiled to support, which makes the site download much faster, which makes curl_exec() return much faster)