I have been having intermittent issues on some servers running Archlinux / php-fpm 5.3.9 in FastCGI on Cherokee 1.2.101. I am using a caching plugin that builds and serves static cache files using logic like:
$cache_file = md5($host . $uri) . '.cache';
if( file_exists($cache_file) ) {
$cache_file_contents = file_get_contents($cache_file)
exit( $cache_file_contents );
}
// else build/save the $cache_file
A few processes will end up in the slow log of php-fpm hanging on that exit()
call. At that time the load spikes, 100% CPU usage goes (almost) entirely to the webserver and PHP pages start returning 500 - Internal Server errors. Sometimes the server recovers on it's own, others I need to restart php-fpm and cherokee.
I have the FastCGI settings for PHP-FPM configured to do a
Even though this is a VPS I would tentatively rule out IO wait on the filesystem as the cache file should already be loaded. I have not been able to catch it in the act to test with vmstat
I have pm.max_requests
set to 500 but wonder if the exit()
call is interfering with the cycling of processes.
The php-fpm log shows a lot of WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers)
. This seems to be a normal part of php-fpm regulating the number of child processes in the pool though
Any tips on troubleshooting would be appreciated. Here are 3 things I found that raised some red flags:
http://www.php.net/manual/en/function.exit.php#96930
https://serverfault.com/questions/84962/php-via-fastcgi-terminated-by-calling-exit#85008
I ended up using the Pythonic Exception wrapping method cited in the comments at http://www.php.net/manual/en/function.exit.php
In the main index.php
class SystemExit extends Exception {}
try{
/* Resume loading web-app */
}
catch (SystemExit $e) {}
In the Cache Logic from the Question, replacing exit( $cache_file_contents );
while (@ob_end_flush());
flush();
echo $cache_file_contents;
throw new SystemExit();
This has alleviated the php-fpm slow logs that show hangs on that exit()
. I'm not entirely convinced that it solved the underlying problem but it has cleaned up the log files.