Search code examples
phpcurlfopenfwrite

cURL headers in command line show content-type as image/png, in PHP shows text/html?


I'm attempting to use cURL to download an external image file. When used from the command line, cURL correctly states the response headers with content-type=image/png. When I attempt to use cURL in PHP however, it returns content-type=text/html.

When attempting to save the file using cURL in PHP, with the CURLOPT_BINARYTRANSFER option set to 1, in conjunction with fopen/fwrite/, the result is a corrupt file.

The only cURL flags I'm using in are -A to send a user agent with the request, which I've also done in PHP by calling curl_setopt($ch, CURLOPT_USERAGENT, ...).

The only thing I can think of that would cause this is perhaps some background request headers sent by cURL which aren't accounted for using the standard PHP functions?

For reference;

CLI

curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I http://find.icaew.com/data/imgs/736c476534ddf7b249d806d9aa7b9ee8.png

PHP

private function curl($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3");
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 1);
        $response = array(
            'html' => curl_exec($ch),
            'http_code' => curl_getinfo($ch, CURLINFO_HTTP_CODE),
            'contentLength' => curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD),
            'contentType' => curl_getinfo($ch, CURLINFO_CONTENT_TYPE)
        );
        curl_close($ch);
        return $response;
    }

public function parseImage() {
        $imageSrc = pq('img.firm-logo')->attr('src');
        if (!empty($imageSrc)) {
            $newFile = '/Users/firstlast/Desktop/Hashery/test01/imgdump/' . $this->currentListingId . '.png';
            $curl = $this->curl('http://find.icaew.com' . $imgSrc);
            if ($curl['http_code'] == 200) {
                if (file_exists($newFile)) unlink($newFile);
                $fp = fopen($newFile,'x');
                fwrite($fp, $curl['html']);
                fclose($fp);
                return $this->currentListingId;
            } else {
                return 0;
            }
        } else {
            return 0;
        }
    }

When I mentioned content-type=text/html The call to $this->curl() results in the contentLength and contentType properties of the returned $response variable having the values -1 and text/html respectively.

I can imagine this is quite an obscure question, so I've attempted to provide as much context as to what is going on/what I'm trying to achieve. Any help in understanding why this is the case, and what I can do to resolve/achieve my goal would be greatly appreciated


Solution

  • If you know exactly what you are getting then get_file_contents() is much simpler.

    A URL can be used as a filename with this function

    http://php.net/manual/en/function.file-get-contents.php

    Also, it is helpful to go through the user comments on php.net as they have written many examples and potential issues or tricks to using the function.