Search code examples
phpcachingfopengoogle-docshttp-caching

How can I force PHP's fopen() to return the current version of a web page?


The current content of this google docs page is:

alt text http://www.deviantsart.com/upload/i9k01q.png

However, when reading this page with the following PHP fopen() script, I get an older, cached version:

alt text
(source: deviantsart.com)

I've tried two solutions proposed in this question (a random attribute and using POST) and I also tried clearstatcache() but I always get the cached version of the web page.

What do I have to change in the following script so that fopen() returns the current version of the web page?

<?php
$url = 'http://docs.google.com/View?id=dc7gj86r_32g68627ff&amp;rand=' . getRandomDigits(10);

echo $url . '<hr/>';
echo loadFile($url);

function loadFile($sFilename) {
    clearstatcache();
    if (floatval(phpversion()) >= 4.3) {
        $sData = file_get_contents($sFilename);
    } else {
        if (!file_exists($sFilename)) return -3;

        $opts = array('http' =>
          array(
            'method'  => 'POST',
            'content'=>''
          )
        );
        $context  = stream_context_create($opts);                

        $rHandle = fopen($sFilename, 'r', $context);
        if (!$rHandle) return -2;

        $sData = '';
        while(!feof($rHandle))
            $sData .= fread($rHandle, filesize($sFilename));
        fclose($rHandle);
    }
    return $sData;
}

function getRandomDigits($numberOfDigits) {
 $r = "";
 for($i=1; $i<=$numberOfDigits; $i++) {
  $nr=rand(0,9);
  $r .=  $nr;
 }
 return $r;
}

?>

ADDED: taking out the $opts and $context gives me a cached page as well:

function loadFile($sFilename) {
    if (floatval(phpversion()) >= 4.3) {
        $sData = file_get_contents($sFilename);
    } else {
        if (!file_exists($sFilename)) return -3;              

        $rHandle = fopen($sFilename, 'r');
        if (!$rHandle) return -2;

        $sData = '';
        while(!feof($rHandle))
            $sData .= fread($rHandle, filesize($sFilename));
        fclose($rHandle);
    }
    return $sData;
}

ADDED: this curl script which sends a Firefox user agent returns the cached version as well:

<?php
$url = 'http://docs.google.com/View?id=dc7gj86r_32g68627ff';
//$user_agent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';
$user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)';
$ch = curl_init();
//curl_setopt($ch, CURLOPT_COOKIEJAR, "/tmp/cookie");
//curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookie");
curl_setopt($ch, CURLOPT_URL, $url ); 
curl_setopt($ch, CURLOPT_FAILONERROR, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); 
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
echo curl_exec($ch);
?>

Solution

  • I also get this:

    Test One;http://docs.google.com/View?id=dc7gj86r_30dzgzbjch
    Test Two;http://docs.google.com/View?id=dc7gj86r_31dbssfrzx
    

    The "caching" must be being done at Google Docs or, more probably, it's your fault (wrong URL?).


    Response headers:

    Set-Cookie: ******
    Content-Type: text/html; charset=UTF-8
    Cache-Control: no-cache, no-store, max-age=0, must-revalidate
    Pragma: no-cache
    Expires: Fri, 01 Jan 1990 00:00:00 GMT
    Date: Sun, 02 May 2010 03:30:29 GMT
    X-Frame-Options: ALLOWALL
    Content-Encoding: gzip
    X-Content-Type-Options: nosniff
    X-XSS-Protection: 1; mode=block
    Content-Length: 3987
    Server: GSE