I have a dedicated server running Ubuntu 20.04, with cPanel 106.11, MySQL 8, PHP 8.1, Elasticsearch 7.17.8 and i run magento 2.4.5-p1. Config Server Security & Firewall is enabled. Every couple of days i get an monitoring alert to say my server doesnt respond to ping and the host has to do a hard reboot, they are getting frustrated with this and say they will turn off monitoring unless i sort this as they have checked all hardware which is fine. This happens at different times and usually overnight.
I have looked through syslog, mysql log, elasticsearch log, magento 2 logs, apache log, kern.log and i cant find the cause of the issue. I have enabled "sar" and the RAM usage around the time is 64%, cpu usage is between 5-10%.
What else can i look at to try and diagnose this issue?
Additional info requested by Wilson:
select count - https://justpaste.it/6zc95
show global status - https://justpaste.it/6vqvg
show global variables - https://justpaste.it/cb52m
full process list - https://justpaste.it/d41lt
status - https://justpaste.it/9ht1i
show engine innodb status - https://justpaste.it/a9uem
top -b -n 1 - https://justpaste.it/4zdbx
top -b -n 1 -H - https://justpaste.it/bqt57
ulimit -a - https://justpaste.it/5sjr4
iostat -xm 5 3 - https://justpaste.it/c37to
df -h, df -i, free -h and cat /proc/meminfo - https://justpaste.it/csmwh
htop - https://freeimage.host/i/HAKG0va
Server is using nvme drives, 32GB RAM, 6 cores, MySQL is running on same server as litespeed.
Server has not gone down again since posting this but the datacentre usually reboot within 15 - 20 mins and 99% of the time happens overnight. The server is not accessible over ssh when it crashes.
Rate Per Second = RPS
Suggestions to consider for your instance (should be available in your cpanel as they are all dynamic variables)
connect_timeout=30 # from 10 seconds to reduce aborted_connects RPHr of 75
innodb_io_capacity=900 # from 200 to use more of NVME IOPS capacity
thread_cache_size=36 # from 9 to reduce threads_created RPHr of 75
read_rnd_buffer_size=32768 # from 256K to reduce handler_read_rnd_next RPS of 5,805
read_buffer_size=524288 # from 128K to reduce handler_read_next RPS of 5,063
Many more opportunities exist to improve performance of your instance. View profile for contact info, please. We are pushing the one question/one answer planned for this platform.