Search code examples
phpnginxcachingfile-get-contentssymlink

file_get_contents getting wrong results


Update

I solved the problem and posted an answer. However, my solution isn't 100% ideal. I would much rather only remove the symlink from the cache with clearstatcache(true, $target) or clearstatcache(true, $link) but that doesn't work.

I would also much rather prevent the caching of symlinks in the first place or remove the symlink from the cache immediately after generating it. Unfortunately, I had no luck with that. For some reason clearstatcache(true) after creating a symlink does not work, it still gets cached.

I will happily award the bounty to anyone that can improve my answer and solve those issues.

Edit

I've attempted to optimize my code by generating a file everytime clearstatcache is run, so that I only need to clear the cache once for each symlink. For some reason, this does not work. clearstatcache needs to be called every time a symlink is including in the path, but why? There must be a way to optimize the solution I have.


I am using PHP 7.3.5 with nginx/1.16.0. Sometimes file_get_contents returns the wrong value when using a symlink. The problem is after deleting and recreating a symlink, its old value remains in the cache. Sometimes the correct value is returned, sometimes the old value. It appears random.

I've tried to clear the cache or prevent caching with:

function symlink1($target, $link)
{
    realpath_cache_size(0);
    symlink($target, $link);
    //clearstatcache(true);
}

I don't really want to disable caching but I still need 100% accuracy with file_get_contents.

Edit

I am unable to post my source code, as it is way too long and complex, so I have created a minimal, reproducible example (index.php) that recreates the problem:

<h1>Symlink Problem</h1>
<?php
    $dir = getcwd();
    if (isset($_POST['clear-all']))
    {
        $nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
        foreach ($nos as $no)
        {
            unlink($dir.'/nos/'.$no.'/id.txt');
            rmdir($dir.'/nos/'.$no);
        }
        foreach (array_values(array_diff(scandir($dir.'/ids'), array('..', '.'))) as $id)
            unlink($dir.'/ids/'.$id);
    }
    if (!is_dir($dir.'/nos'))
        mkdir($dir.'/nos');
    if (!is_dir($dir.'/ids'))
        mkdir($dir.'/ids');
    if (isset($_POST['submit']) && !empty($_POST['id']) && ctype_digit($_POST['insert-after']) && ctype_alnum($_POST['id']))
    {
        $nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
        $total = count($nos);
        if ($total <= 100)
        {
            for ($i = $total; $i >= $_POST['insert-after']; $i--)
            {
                $id = file_get_contents($dir.'/nos/'.$i.'/id.txt');
                unlink($dir.'/ids/'.$id);
                symlink($dir.'/nos/'.($i + 1), $dir.'/ids/'.$id);
                rename($dir.'/nos/'.$i, $dir.'/nos/'.($i + 1));
            }
            echo '<br>';
            mkdir($dir.'/nos/'.$_POST['insert-after']);
            file_put_contents($dir.'/nos/'.$_POST['insert-after'].'/id.txt', $_POST['id']);
            symlink($dir.'/nos/'.$_POST['insert-after'], $dir.'/ids/'.$_POST['id']);
        }
    }
    $nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
    $total = count($nos) + 1;
    echo '<h2>Ids from nos directory</h2>';
    foreach ($nos as $no)
    {
        echo ($no + 1).':'.file_get_contents("$dir/nos/$no/id.txt").'<br>';
    }
    echo '<h2>Ids from using symlinks</h2>';
    $ids = array_values(array_diff(scandir($dir.'/ids'), array('..', '.')));
    if (count($ids) > 0)
    {
        $success = true;
        foreach ($ids as $id)
        {
            $id1 = file_get_contents("$dir/ids/$id/id.txt");
            echo $id.':'.$id1.'<br>';
            if ($id !== $id1)
                $success = false;
        }
        if ($success)
            echo '<b><font color="blue">Success!</font></b><br>';
        else
            echo '<b><font color="red">Failure!</font></b><br>';
    }
?>
<br>
<h2>Insert ID after</h2>
<form method="post" action="/">
    <select name="insert-after">
        <?php
            for ($i = 0; $i < $total; $i++)
                echo '<option value="'.$i.'">'.$i.'</option>';
        ?>
    </select>
    <input type="text" placeholder="ID" name="id"><br>
    <input type="submit" name="submit" value="Insert"><br>
</form>
<h2>Clear all</h2>
<form method="post" action="/">
    <input type="submit" name="clear-all" value="Clear All"><br>
</form>
<script>
    if (window.history.replaceState)
    {
        window.history.replaceState( null, null, window.location.href );
    }
</script>

It seemed very likely to be a problem with Nginx configuration. Not having these lines can cause the problem:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;

Here is my Nginx configuration (you can see I have included the above lines):

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name www.websemantica.co.uk;
    root "/path/to/site/root";
    index index.php;

    location / {
        try_files $uri $uri/ $uri.php$is_args$query_string;
    }

    location ~* \.php$ {
        try_files $uri =404;
        fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
        fastcgi_param   QUERY_STRING            $query_string;
        fastcgi_param   REQUEST_METHOD          $request_method;
        fastcgi_param   CONTENT_TYPE            $content_type;
        fastcgi_param   CONTENT_LENGTH          $content_length;

        fastcgi_param   SCRIPT_FILENAME         $realpath_root$fastcgi_script_name;
        fastcgi_param   SCRIPT_NAME             $fastcgi_script_name;
        fastcgi_param   PATH_INFO               $fastcgi_path_info;
        fastcgi_param   PATH_TRANSLATED         $realpath_root$fastcgi_path_info;
        fastcgi_param   REQUEST_URI             $request_uri;
        fastcgi_param   DOCUMENT_URI            $document_uri;
        fastcgi_param   DOCUMENT_ROOT           $realpath_root;
        fastcgi_param   SERVER_PROTOCOL         $server_protocol;

        fastcgi_param   GATEWAY_INTERFACE       CGI/1.1;
        fastcgi_param   SERVER_SOFTWARE         nginx/$nginx_version;

        fastcgi_param   REMOTE_ADDR             $remote_addr;
        fastcgi_param   REMOTE_PORT             $remote_port;
        fastcgi_param   SERVER_ADDR             $server_addr;
        fastcgi_param   SERVER_PORT             $server_port;
        fastcgi_param   SERVER_NAME             $server_name;

        fastcgi_param   HTTPS                   $https;

        # PHP only, required if PHP was built with --enable-force-cgi-redirect
        fastcgi_param   REDIRECT_STATUS         200;

        fastcgi_index index.php;
        fastcgi_read_timeout 3000;
    }

    if ($request_uri ~ (?i)^/([^?]*)\.php($|\?)) {
        return 301 /$1$is_args$args;
    }
    rewrite ^/index$ / permanent;
    rewrite ^/(.*)/$ /$1 permanent;
}

Currently I have the above example live at https://www.websemantica.co.uk.

Try adding a few values in the form. It should display Success! in blue every time. Sometimes is shows Failure! in red. It may take quite a few page refreshes to change from Success! to Failure! or vice-versa. Eventually, it will show Success! every time, therefore there must be some sort of caching problem.


Solution

  • There were two issues that caused the problem.

    First issue

    I already posted as and edit in the question. It's a problem with the Nginx configuration.

    These lines:

    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_param DOCUMENT_ROOT $document_root;
    

    needed replaced with:

    fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
    fastcgi_param DOCUMENT_ROOT $realpath_root;
    

    Second issue

    The second issue was I needed to call clearstatcache before calling file_get_contents. I only want to call clearstatcache when it's absolutely necessary, so I wrote a function that only clears the cache when the directory includes a symlink.

    function file_get_contents1($dir)
    {
        $realPath = realpath($dir);
        if ($realPath === false)
            return '';
        if ($dir !== $realPath)
        {
            clearstatcache(true);
        }
        return file_get_contents($dir);
    }