"MySQL server has gone away" while iterating unbuffered query with PHP 7.4 PDO on local MySQL 5.7 database InnoDB

My PHP 7.4 script select a large InnoDB table from a local MySQL 5.7 database and iterates every line using PDO. I'm using unbuffered queries by purpose as the whole table won't fit server memory.

The query is a simple SELECT * FROM table WHERE field_a = ... ORDER BY field_b, field_c.

The code iterating the result is:

$rows = $db->prepare('redacted');
$rows->bindParam(':redacted', $redacted);
$rows->execute();

foreach($rows as $r) {
    if($r->fieldname != redacted)
        doWork1();
    
    doWork2();
}

doWork2() and it always does the following job, using a separate, buffered PDO connection:

updates the row on the db;
in a transaction, insert few new rows in another table (one INSERT query per row).

Every iteraction doesn't take more than 1-2 seconds, but the whole loop could take hours to complete.

The script works fine, but exactly every 12:00 minutes, PDO throws the following exception:

[03-Feb-2023 08:34:29 Europe/Rome] PHP Fatal error:  Uncaught PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away in redacted:176
Stack trace:
redacted
[03-Feb-2023 08:46:50 Europe/Rome] PHP Fatal error:  Uncaught PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away in redacted:176
Stack trace:
redacted

Line number is always the foreach or the if.

mysqld.log says:

2023-02-03T08:36:33.480661+01:00 781575 [Note] Aborted connection 781575 to db: 'redacted' user: 'redacted' host: 'localhost' (Got timeout writing communication packets)
2023-02-03T08:48:56.534865+01:00 781768 [Note] Aborted connection 781768 to db: 'redacted' user: 'redacted' host: 'localhost' (Got timeout writing communication packets)

Upon restarting the script, it works fine on the same data, so it isn't a problem with values or datatypes. mysqladmin version shows the server never crashed.

I suppose it is due to a sort of timeout.

I noticed the timestamp in mysql log is about 2 minutes later the PHP fatal error, so the disconnection due to writing timeout is the consequence, not the cause. For this reason, increasing net_read_timeout and net_write_timeout (120 seconds), wait_timeout (8 hours), max_allowed_packet (256 MB) didn't help. I reviewed carefully all server variables and I didn't have any value near 12 minutes or 720 seconds.

The query is fast (less than 3 seconds) and optimized; no big fields; it uses indexes; the query EXPLAIN is (relevant part): 594802 | 5.67 | Using index condition; Using where; Using filesort. I'm not in cases mentioned at https://dev.mysql.com/doc/refman/8.0/en/gone-away.html

The machine is a AWS Lightsail with Linux and cPanel.

#  mysqladmin version
mysqladmin  Ver 8.42 Distrib 5.7.40, for Linux on x86_64
redacted
Uptime:                 10 days 10 hours 34 min 13 sec
redacted
Threads: 6  Questions: 15310531  Slow queries: 0  Opens: 9989  Flush tables: 1  Open tables: 1854  Queries per second avg: 16.972

I enabled query log and inspected carefull around time of the error and all queries are correct.

Solution

I was surprised when I was able to replicate this so easily. I created three separate PDO connections to my db server -

for the unbuffered query
run regular calls to check on the status of the thread handling the unbuffered query
run updates based on rows being fetched from the unbuffered query

Without making any config changes, the thread handling the unbuffered query was being closed by the server after 16 minutes. I was unable to find anything in the session vars that would suggest a timeout at 16 minutes.

The solution in my case was to increase the net_write_timeout to be long enough for the entire unbuffered result to be fetched, row-by-row. I increased it to 30mins, 60mins and then 2 hours. Each time the connection for the unbuffered query result remained open until the specified timeout elapsed (1 - 6 minutes after).

I then set it to 36000 (10 hrs) and it has now been running for just over four hours -

SET SESSION net_write_timeout = 36000

I suspect using smaller buffered queries (num of rows dependent on your average row size) to select the data from the source table would be a better solution than holding this connection open for the unbuffered query for hours on end.

The only recent article I came across that discusses unbuffered queries and the effect of net_write_timeout is this one by Hayden James. It only touched on net_write_timeout in passing.