Search code examples
djangopostgresqlgunicorngevent

Slow performing postgresql commit statements


In a Django web app with a postgresql backend (and nginx reverse proxy with gunicorn application server), I'm seeing dozens of COMMIT messages in postgresql's slow log. Behold:

2020-02-01 17:56:16.335 UTC [19424] ubuntu@app LOG LOG:  duration: 175.630 ms  statement: COMMIT
2020-02-01 17:56:21.355 UTC [19435] ubuntu@app LOG LOG:  duration: 107.735 ms  statement: COMMIT
2020-02-01 17:57:22.592 UTC [19419] ubuntu@app LOG LOG:  duration: 235.313 ms  statement: COMMIT
2020-02-01 17:57:30.685 UTC [19419] ubuntu@app LOG LOG:  duration: 249.875 ms  statement: COMMIT
2020-02-01 17:57:30.688 UTC [19424] ubuntu@app LOG LOG:  duration: 99.049 ms  statement: COMMIT
2020-02-01 17:57:30.688 UTC [19435] ubuntu@app LOG LOG:  duration: 115.772 ms  statement: COMMIT
2020-02-01 17:57:30.688 UTC [19554] ubuntu@app LOG LOG:  duration: 248.656 ms  statement: COMMIT
2020-02-01 17:58:03.266 UTC [19435] ubuntu@app LOG LOG:  duration: 780.232 ms  statement: COMMIT
2020-02-01 17:58:03.270 UTC [19424] ubuntu@app LOG LOG:  duration: 622.424 ms  statement: COMMIT
2020-02-01 17:58:07.579 UTC [19435] ubuntu@app LOG LOG:  duration: 75.658 ms  statement: COMMIT

The DB in question was just migrated from one dedicated server to another a day ago. In its previously environment, COMMIT never showed up in slow log. In the new environment, I made some changes:

1) I set checkpoint_completion_target = 0.7 (down from checkpoint_completion_target = 0.8 previously)

2) I switched to gevent workers in gunicorn (previously used sync workers). This also entailed I add the following to postgresql (allowing us to use psycopg2 asynchronously with gevent - source):

from psycogreen.gevent import patch_psycopg

def post_fork(server, worker):
  from gevent import monkey

  patch_psycopg()
  worker.log.info("Made PostgreSQL Green!")
  monkey.patch_all()

3) Increased max_connections to 400 (from 300 previously)

4) Increased shared_buffers to 8GB (from 6GB previously)

And also a few other changes where I increased the thresholds (which ought to be a good thing).

Can an expert suggest whether the changes I've made above might have led to all the COMMIT statements littering my slow log? If not, any other guesses?

And most importantly, what are some mitigation steps I can take to ameliorate this situation?


vmstat 1 yields:

ubuntu@main-app:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
16  0      0 10031160 263796 11650344    0    0     2    20    5    4 13  1 86  0  0
 2  0      0 10030184 263796 11650308    0    0     0  1832 8598 9433 20  1 78  0  0
 4  1      0 10028432 263796 11650416    0    0    40   436 7229 7887 20  1 79  0  0
 6  0      0 10027076 263796 11650456    0    0     0   764 7836 8833 17  1 81  0  0
 4  0      0 10025904 263796 11650464    0    0     0   588 7947 9754 22  1 76  0  0
 4  0      0 10025988 263796 11650512    0    0     0   652 9727 12033 30  2 68  0  0
 7  0      0 10025572 263796 11650532    0    0     0   816 8296 9984 25  1 73  0  0
 2  1      0 10025680 263796 11650596    0    0     0  1128 8794 11003 23  1 75  0  0
 0  0      0 10025552 263796 11650588    0    0     0   288 7153 8091 20  1 79  0  0
 6  0      0 10025096 263796 11650612    0    0     0   412 9423 12016 25  2 73  0  0
 1  0      0 10025056 263796 11650640    0    0     0   240 9227 11442 32  2 66  0  0
 6  0      0 10025056 263796 11650800    0    0    32  1036 8762 10418 25  2 73  0  0
 2  0      0 10025116 263796 11650828    0    0     0   352 8730 10924 23  2 75  0  0
 6  0      0 10024992 263796 11650940    0    0     0   592 7920 9399 14  1 85  0  0
 3  0      0 10024288 263796 11650952    0    0     0   380 8380 9662 23  1 75  0  0
 4  0      0 10024536 263796 11650896    0    0     0   680 9193 10819 22  1 76  0  0
 1  0      0 10024720 263796 11650776    0    0     0   588 9655 10757 24  2 74  0  0
 5  0      0 10024820 263796 11650700    0    0     0    48 10237 13216 28  2 70  0  0
 1  0      0 10023660 263796 11650716    0    0     0   396 9291 11251 34  2 64  0  0
 6  0      0 10024564 263796 11650744    0    0     0   720 8557 10500 22  1 76  0  0

Solution

  • There are two likely explanations for slow commits:

    1. Overloaded storage. This can be caused by

      • The high I/O volume.

      • Lots of small transactions, where the WAL sync requests are just too many.

    2. WITH HOLD cursors on larger queries.

    On Linux, check the %iowait column in vmstat 1 to see if the I/O subsystem is overloaded.

    About your measures:

    • Increasing max_connections or decreasing checkpoint_completion_target will have an adverse effect, if anything.

    • Increasing shared_buffers will help if the problem is the amount of read I/O.

    • If the problem are the many sync requests, and you can afford to lose a few committed transactions in the event of a crash, set synchronous_commit = off.

      If that is not an option, you can play with commit_delay to get the I/O load down.

    I have never heard of "gevent workers", so I cannot say anything about them.