Search code examples
phpcurlhttp-headerscontent-type

Curl header request returning 404 but body returning 200


I'm sending a header request with curl using the following code

function getContentType($u)
{
    $ch = curl_init();
    $url = $u;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    curl_setopt($ch, CURLOPT_NOBODY, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20100101 Firefox/7.0.12011-10-16 20:23:00");

    $results = split("\n", trim(curl_exec($ch)));
    print_r($results);
    foreach($results as $line) {
        if (strtok($line, ':') == 'Content-Type') {
            $parts = explode(":", $line);
            return trim($parts[1]);
        }
    }
}

For most websites it is returning correctly, although for some servers it is returning a 404 error when the page is actually available. I'm assuming this is because the servers have been configured to reject the header request.

I'm looking for a way to bypass this server header request rejection, or a way to tell if the header request has been rejected and is not in fact 404.


Solution

  • Setting CURLOPT_NOBODY to "true" with curl_setopt sets the request 
    method to HEAD for HTTP(s) requests, and furthermore, cURL does not read 
    any content even if a Content-Length header is found in the headers. 
    However, setting CURLOPT_NOBODY back to "false" does *not* reset the 
    request method back to GET. But because it is now "false", cURL will 
    wait for content if the response contains a content-length header. 
    

    My guess is that you're using a HEAD request instead of GET and therefore getting rejected for it.