Search code examples
pythonpostgresqlpsycopg2

Postgres closes connection during query after a few hundred seconds when using Psycopg2


I'm running PostgreSQL 9.6 (in Docker, using the postgres:9.6.13 image) and psycopg2 2.8.2.

My PostgreSQL server (local) hosts two databases. My goal is to create materialized views in one of the databases that use data from the other database using Postgres's foreign data wrappers. I do all this from a Python script that uses psycopg2.

This works well as long as creating the materialized view does not take too long (i.e. if the amount of data being imported isn't too large). However, if the process takes longer than roughly ~250 seconds, psycopg2 throws the exception

psycopg2.OperationalError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

No error message (or any message concerning this whatsoever) can be found in Postgres's logs.

Materialized view creation completes successfully if I do it from an SQL client (Postico).

This code illustrates roughly what I'm doing in the Python script:

db = pg.connect(
    dbname=config.db_name,
    user=config.db_user,
    password=config.db_password,
    host=config.db_host,
    port=config.db_port
)
with db.cursor() as c:
    c.execute("""
        CREATE EXTENSION IF NOT EXISTS postgres_fdw;
        CREATE SERVER fdw FOREIGN DATA WRAPPER postgres_fdw OPTIONS (...);
        CREATE USER MAPPING FOR CURRENT_USER SERVER fdw OPTIONS (...);
        CREATE SCHEMA foreign;
        IMPORT FOREIGN SCHEMA foreign_schema FROM SERVER fdw INTO foreign;
    """)
    c.execute("""
        CREATE MATERIALIZED VIEW IF NOT EXISTS my_view AS (
            SELECT (...)
            FROM foreign.foreign_table
        );
    """)

Solution

  • Adding the keepalive parameters to the psycopg2.connect call seems to have solved the problem:

    self.db = pg.connect(
                dbname=config.db_name,
                user=config.db_user,
                password=config.db_password,
                host=config.db_host,
                port=config.db_port,
                keepalives=1,
                keepalives_idle=30,
                keepalives_interval=10,
                keepalives_count=5
            )
    

    I still don't know why this is necessary. I can't find anyone else who has described having to use the keepalives parameter keywords when using Postgres in Docker just to be able to run queries that take longer than 4-5 minutes, but maybe it's obvious enough that nobody has noted it?