Search code examples
phpweb-scrapingcurl

PHP & CURL scraping


I have a problem when I run this script in Google Chrome I got a blank page. When I use another link of a web site, it works successfully. I do not what is happening.

$curl = curl_init();

$url = "https://www.danmurphys.com.au/dm/home";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($curl);

echo $output;

Solution

  • There are some conditions which make your result blank. Such as:

    1. Curl error.
    2. Redirection without response body and the curl doesn't follow the redirection.
    3. The target host doesn't give any response body.

    So here you have to find out the problem.

    • For the first possibility, use curl_error and curl_errno to confirm that the curl wasn't errored when its runtime.
    • For the second, use CURLOPT_FOLLOWLOCATION option to make sure the curl follows the redirection.
    • For the third possibility, we can use curl_getinfo. It returns an array which contains "size_download". The size_download shows you the length of the response body. If it is zero that is why you see a blank page when printing it.

    One more, try to use var_dump to see the output (debug purpose only). There is a possibility where the curl_exec returns bool false or null. If you print the bool false or null it will show a blank.

    Here is the example to use all of them.

    <?php
    
    $curl = curl_init();
    $url = "https://www.danmurphys.com.au/dm/home";
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    
    $output = curl_exec($curl);
    $info = curl_getinfo($curl);
    $err = curl_error($curl);
    $ern = curl_errno($curl);
    
    if ($ern) {
        printf("An error occurred: (%d) %s\n", $ern, $err);
        exit(1);
    }
    curl_close($curl);
    
    printf("Response body size: %d\n", $info["size_download"]);
    
    // Debug only.
    // var_dump($output);
    
    echo $output;
    

    Hope this can help you.

    Update:

    You can use CURLOPT_VERBOSE to see the request and response information in details. Just add this

    curl_setopt($curl, CURLOPT_VERBOSE, true);
    

    It doesn't need to be printed, the curl will print it for you during runtime.