Search code examples
phpfpmopcachephp-deployer

What is causing a segfault when Opcache file cache is enabled in a symlink deployment?


I'm using PHP deployer which is a symlink based deployment tool which calls opcache:reset after deployment.

Recently I'm getting a segfault in my PHP-FPM processes after deployment.

This manifests as segfault entires in the PHP logs/memory exhausted warnings being output or 500/503 errors from Apache.

This has worked fine for years, so I'm stumped as to what has caused this to happen now.

I can reproduce this on every page refresh of an affected pool after the first segfault.

If I restart the PHP-FPM process, the page is served correctly the first time and then subsequently I get a segfault.

I suspect this is related to Opcache, since if I delete the Opcache cache files and restart PHP-FPM, the problem goes away. If I disable opcache.file_cache, again there is no problem. Note that I'm not exclusively using the file cache.

When I (unscientifically) checked the opcache version of files reported by PHP during the memory exhaustion error, I noticed they seemed to be truncated or missing data that existed in previous versions of the opcache file where the segfault did not occur. When I removed those specific opcache files and restarted PHP-FPM the problem went away. So it seems, I think, that the opcache files are getting corrupted between deploys.

Environment:

  • CentOS 7.9.2009
  • PHP 7.4.15
  • Each pool is configured to run under its own user
  • Opcache files are stored in /var/www/vhosts/[project]/opcache
  • SELinux is enabled

My opcache config:

zend_extension=opcache.so
opcache.enable=1
opcache.enable_cli=0
opcache.memory_consumption=96
opcache.interned_strings_buffer=16
opcache.max_accelerated_files=4096
opcache.max_wasted_percentage=5
opcache.validate_timestamps=1
opcache.revalidate_path=0
opcache.revalidate_freq=2
opcache.max_file_size=0

The core dump:

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/php-fpm...Reading symbols from /usr/lib/debug/usr/sbin/php-fpm.debug...done.
done.
[New LWP 6221]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `php-fpm: pool cpsastag         '.
Program terminated with signal 11, Segmentation fault.
#0  ZEND_BIND_STATIC_SPEC_CV_UNUSED_HANDLER () at /usr/src/debug/php-7.4.15/Zend/zend_vm_execute.h:46883
46883           zval *value;
(gdb) bt
#0  ZEND_BIND_STATIC_SPEC_CV_UNUSED_HANDLER () at /usr/src/debug/php-7.4.15/Zend/zend_vm_execute.h:46883
#1  0x00005559a7261159 in execute_ex (ex=0x0) at /usr/src/debug/php-7.4.15/Zend/zend_vm_execute.h:57657
#2  0x00005559a7266930 in zend_execute (op_array=0x7f03fb075000, return_value=<optimized out>) at /usr/src/debug/php-7.4.15/Zend/zend_vm_execute.h:57957
#3  0x00005559a71e1a63 in zend_execute_scripts (type=type@entry=8, retval=0x7f03fb016330, retval@entry=0x0, file_count=file_count@entry=3) at /usr/src/debug/php-7.4.15/Zend/zend.c:1679
#4  0x00005559a717fd00 in php_execute_script (primary_file=primary_file@entry=0x7ffd44c9b940) at /usr/src/debug/php-7.4.15/main/main.c:2621
#5  0x00005559a6fe582b in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/php-7.4.15/sapi/fpm/fpm/fpm_main.c:1939
(gdb)

SELinux audit.log

type=ANOM_ABEND msg=audit(1614726077.349:20549): auid=4294967295 uid=1010 gid=1011 ses=4294967295 subj=system_u:system_r:httpd_t:s0 pid=825 comm="php-fpm" reason="memory violation" sig=11

I'd really appreciate any pointers on what could be wrong or how to debug further.


Solution

  • Overview

    Our issue was not involving a symlink, but otherwise had similar errors and server configuration.

    The segfault appears related to PHP's opcache files (opcache.file_cache) not being properly cleared, or else not syncing with the in-memory RAM opcache.

    In our case, the opcache reset was triggered immediately after upgrading to Wordpress 6.3.1 and doing a 'Purge all Caches' with W3 Total Cache 2.4.1.

    We do have SELinux enabled, and that's always a possible culprit, but this worked for years with SELinux, so it's unclear if that's contributing or if its logs are just capturing a 'memory violation' from a crash on a file it's monitoring.

    Solution

    Turning off the secondary opcache file-based caching for all sites / PHP-FPM pools by commenting out the opcache.file_cache in each active pool configuration file in /etc/php-fpm.d/. Then, reloading PHP-FPM (and Apache for good measure).

    Secondary file-based opcache (via opcache.file_cache) is not necessary for most smaller servers, so this may be an acceptable solution / fix. You lose the benefits of the file caching, which I believe is mainly for quicker startups on server or service restarts.

    Workarounds

    • Workaround 1: Disabling Opcache by setting opcache.enable=0 in the /etc/php.d/10-opache.ini (your opcache .ini file may have a similar, but different name) and reloading PHP-FPM (and Apache for good measure) makes the failures stop. Though, the sites will be slower with opcache disabled, so this option is not a desirable permanent workaround.
    • Workaround 2: We didn't try this, but disabling opcache in any / all W3 Total Cache plugins (General Settings) may be worth a try.
    • Workaround 3: Reverting any Wordpress and W3 Total Cache updates

    Symptoms

    • Upon doing a "Purge all Caches" in W3 Total Cache on the Wordpress site PHP-FPM became unstable for all pools / websites with 503 or white page errors in the web browser.
    • SELinux Log (/var/log/audit/audit.log) includes - subj=system_u:system_r:httpd_t:s0 pid=18566 comm="php-fpm" reason="memory violation" sig=11
    • PHP-FPM Logs (/var/log/php-fpm) includes - php-fpm[13454]: segfault at 563eda614e70 ip 00007f2bd7313a8c sp 00007ffd389af5f0 error 6 in opcache.so[7f2bd72f5000+b8000]
    • Apache Logs (/var/log/httpd) includes - AH01067: Failed to read FastCGI header
    • Linux System Log (/var/log/messages) - opcache exited on signal 11 (SIGSEGV) opcache.file_cache
    • Reinstalling PHP, PHP-FPM, and Apache did not fix the issues

    Server Configuration

    • PHP-FPM using separate pools for each site
    • Each PHP-FPM pool had opcache.file_cache enabled with a custom path
    • One website using Wordpress version 6.3.1 with the W3 Total Cache plugin version 2.4.1
    • Default PHP Opcode caching opcache.so enabled with jit_buffer_size=0 (disabled)
    • CentOS 7
    • Apache/2.4.57
    • PHP 8.1.23 via Remi repo
    • Multiple sites, including Wordpress, Laravel, and Drupal on the same server

    Diagnosing

    Log Checking Examples

    • Check Apache logs for AH01067 grep -r AH01067 /var/log/httpd/* | tail -n20
    • Check PHP-FPM logs for SIGSEGV grep -r SIGSEGV /var/log/php-fpm/* | tail -n20
    • Check SELinux logs for memory violation or opcache, if it's installed, (usually only for RHEL, CentOS, Fedora) grep 'memory violation' /var/log/audit/audit.log* | tail -n 20 or grep 'opcache' /var/log/audit/audit.log* | tail -n 20
    • Check Linux system log for opcache grep -r opcache /var/log/messages | tail -n20

    Notes

    • Your log files may be in different directories, or have different file names. It depends on where your operating system stores the logs. The log paths above are common for the Redhat (RHEL) / CentOS / Fedora family of Linux.
    • The | tail -n 20 will show only the last 20 instances. You can remove this part to see all results, or change the 20 to some other number.
    • SELinux log timestamps can be converted into human-readable times by running the date script on a Linux command-line, like date -d@1693851386, where 1693851386 is the timestamp value in the log.
    • SELinux log may refer to an inode, instead of a file or directory. The inode ID in the log can be mapped to a file using find / -inum 63377429, where 63377429 is the inode value in the log.