Search code examples
performancepgbouncer

What's the worst case scenario if one runs individual queries when pgbouncer has pool_mode=transaction?


Let's assume following config for pgbouncer:

pool_mode=transaction
reset_query=discard all
reset_query_always=0

If I have a connection that does

begin transaction
...
commit transaction

then it's pretty simple because during the transaction is open, pgbouncer will reserve the backend connection for this client connection only.

However, if a client app instead sends

select select pg_advisory_lock(123);
begin transaction isolation level serializable
...
commit transaction
select select pg_advisory_unlock(123);

is it possible that the backend connection is switched between queries so that backend connection #1 acquires the lock, connection #2 executes the transaction and connection #3 tries to unlock the advisory lock and obviously fails?

(The advisory locking would be used as an optimization for high load situations where collisions between serialized transactions cause high CPU load on the backend database server due lots of rolled back transactions. Normally collisions happen seldom enough that serializable transactions result in lower latency than using explicit locking.)

Here's the only related question that I can find: How does pgbouncer behave when transaction pooling is enabled and a single statement is issued? – however that doesn't answer my problem. Reading the answers suggests that if advisory lock is taken/released when timeout has not been exceeded since previous query the above should work, but I don't know if that can be trusted.


Solution

  • I tested this with following pgbouncer config:

    [pgbouncer]
    pool_mode = transaction
    server_reset_query = DISCARD ALL
    server_check_query = select 1
    server_reset_query_always = 0
    
    max_client_conn = 2000
    default_pool_size = 1
    min_pool_size = 1
    
    reserve_pool_size = 0
    reserve_pool_timeout = 5
    
    max_db_connections = 1
    

    and two parallel connections to same database through pgbouncer. Following uses P1 ja P2 as identifiers for the queries sent by each process. If one does a simple test such as:

    P1: set application_name=p1;
    P1: select pg_advisory_lock(42);
    P2: set application_name=p2;
    P1: show application_name;
        p1
    P2: select pg_advisory_lock(42);
    P2. show application_name;
        p2
    

    ... it seems that both connections were able to acquire the same exclusive advisory lock even though they have unique application_name values so each connection should be unique! In reality, all commands have been sent to postgres using a single connection so this single connection has acquired the advisory lock 42 twice and everything is fine if it releases the same lock later twice (logically once per P1 and P2 connection). This happens because exclusive locks can be taken multiple times by the same owner and must be released equally many times before any other process can acquire the same lock.

    As a result, following is also possible:

    P1: select pg_advisory_lock(42);
    P2: select pg_advisory_lock(42);
    P1: begin transaction isolation level serializable;
    P1: select 1;
    P1: commit;
    P2: begin transaction isolation level serializable;
    P2: select 1;
    P2: commit;
    P1: select pg_advisory_unlock(42);
    P2: select pg_advisory_unlock(42);
    

    Imagine multiple queries and database changes instead of select 1 above.

    Note that both database client applications acquired exclusive lock before doing serializable transaction and things still didn't work correctly.

    Obviously this kind of situation is not possible unless P1 and P2 actually share the same database connection. And normally one would allow more than one parallel database connection through pgbouncer so this kind of code is racy. In practice, it will work until you hit high server load and then you get random failures depending on which query gets executed together with another query from another client.

    As a result, it's definitely NOT safe to do any queries outside the transaction if pool_mode=transaction is active. The option server_reset_query_always=1 cannot really fix the problem and should never be used. If you feel that you need server_reset_query_always=1 you need to use pool_mode=session instead or you're risking random data corruption.

    In addition, pgbouncer seems to be clever enough to fake some connection specific data. For example, when P1 sets its application_name in the first example above and later queries it after P2 has already set its application name on the same connection, the P1 will get the expected result. However, if you monitor the pg_stat_activity while this is taking place the only active connection to postgres changes its application_name value each time P1 or P2 sends a query. This makes it appear that this kind of mixing is okay with pool_mode=transaction.

    In the end, setting application_name outside the transaction should be safe but any feature actually implemented on the postgres level is not safe. Unless you're absolutely sure that pgbouncer can emulate the feature you need, do not emit any query to database connection acquired from pgbouncer in pool_mode=transaction except begin ..., commit ... or rollback. Once the transaction is active the connection has been reserved for you and everything works the as with real direct connection to postgres until you do commit or rollback.

    I would really like that pgbouncer would always return an error any time a client tries to do any query outside a transaction if pool_mode=transaction has been set. Unfortunately, this is not the reality we live in and pgbouncer clients have to be careful instead.