Search code examples
phpcurlphp-curl

Unable to download remote image with PHP CURL


Edit: I contact the support at scrapestack and confirmed that their api doesn't support image files.

I am trying to download a remote image using CURL with php. Below is my code. But whenever I try to open the downloaded image, I always get:

Cannot read this file. This is not a valid bitmap file, or its format is not currently supported.

Anyone know what is wrong with my code? Thank you.

$image ="http://api.scrapestack.com/scrape?access_key=TOKEN-HERE&url=https://i.imgur.com/Cbiu8Ef.png";
$imageName = pathinfo( $image, PATHINFO_BASENAME );
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, $image );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTP_CONTENT_DECODING, false);
$source = curl_exec( $ch );
$info = curl_getinfo($ch);
curl_close( $ch );
file_put_contents( $imageName, $source );

I am not able to open the file, when I tried to open it with sublime, it is stuck at Loading Image. When I open it with notepad, I got the following that looks like PNG image, but it is not a valid image. File starts with �PNG

IHDR       �   q�I�    IDATx�k�]�u�o��(��_�M��m�8:���_r�G

You can see the file here: https://gofile.io/?c=cfsYf2

Looks like the problem is making the curl request through Scrapestack, because if I point the curl to image url directly, the image is downloaded correctly, like below:

$image ="https://i.imgur.com/Cbiu8Ef.png";

Solution

  • It looks like the response you get is a corrupt PNG image.

    If you are using PHP with version prior of 5.1.3 you need to specify an additional option for binary data transfers, like images:

    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
    

    If the above options doesn't solve the issue you may try setting

    curl_setopt($ch, CURLOPT_HTTP_CONTENT_DECODING, false);
    

    in case the response has the Content-Type header set wrong letting curl do unwanted decoding on the raw output.