I'm working on some C++ code that makes use of libpq to interact with a PostgreSQL database, and I've started writing some functions, each of which internally starts a transaction, applies several changes to the DB, then ends the transaction.
I would now like to call one of these functions in combination with another DML statement, all executed in the same transaction. Something like this (note: very simplified):
void doFoo(PGconn* con) {
PQexec(con, "BEGIN" );
PQexec(con, "insert into ..." );
PQexec(con, "delete from ..." );
PQexec(con, "END" );
}
void doFooPlus(PGconn* con) {
PQexec(con, "BEGIN" );
doFoo(con);
PQexec(con, "update ..." );
PQexec(con, "END" );
}
void main(void) {
doFooPlus(con);
}
However, based on everything I've read, it looks like PostgreSQL may not honor this kind of nesting of transactions. I want to be clear: I don't need autonomous transactions, which I know PostgreSQL does not support, and I don't need any explicit rollback functionality to a nested (or otherwise) BEGIN statement, which could be accomplished with savepoints, and is obviously not what the above code is trying to do at any point. I just want to confirm whether the above code will do what one would hope it would do from the structure of the code.
Let me try to clarify further. Here is what PostgreSQL will end up seeing from the above C++ (well actually just straight C) code:
BEGIN
BEGIN
insert into ...
delete from ...
END
update ...
END
What I'm worried about is the second BEGIN call being completely ignored, and so the first END call would then end the transaction started by the first BEGIN call, and so the update statement would then not be included in the atomicity of the transaction block.
According to http://www.postgresql.org/docs/9.4/static/sql-begin.html:
Issuing BEGIN when already inside a transaction block will provoke a warning message. The state of the transaction is not affected. To nest transactions within a transaction block, use savepoints (see SAVEPOINT).
The comment about savepoints seems kind of misleading to me; PostgreSQL does not support nested (autonomous) transactions; savepoints only provide a way to roll back to a point in the middle of an existing transaction. Savepoints are not transactions in themselves, so I cannot replace the BEGIN and END in doFoo()
with them, because then I would not be able to call doFoo()
by itself (meaning not from doFooPlus()
) and still get transaction atomicity for the insert and delete executed by doFoo()
.
The comment about the state of the transaction "not [being] affected" by the nested BEGIN does seem to imply that PostgreSQL will not count it, and will in fact completely ignore it, but the quoted phrase doesn't quite make it clear enough to me that I wasn't going to ask this on Stack Overflow. I'm still holding out an ounce of hope that PostgreSQL will still count the nested BEGIN towards some kind of internal "nesting level", which will be decremented by the first END, and then decremented again by the second END, causing the entire sequence of statements to be treated as one atomic transaction.
So, can someone please confirm whether PostgreSQL does this or not? And if not, can you please provide recommendations on how best to solve this problem in my code? I was thinking of adding a bool parameter to allow the caller of doFoo()
to specify whether to create a transaction or not, to which doFooPlus()
could pass false.
Edit: For anyone who's interested, I realized today that I could test this question myself fairly easily, simply by writing a program that does something like what the above example code is trying to do, and then examining its effect on the database.
I won't go into too much detail regarding the internals of the program, but the command below basically runs create table t1 (a int, b int, c int ); insert into t1 (a,b,c) values (0,0,0);
, then runs each of the given SQL statements in order, and finally prints the resulting table data, so you can see that the second begin
and the final rollback
were completely ignored:
> pgtestabc begin begin 'update t1 set a=1;' 'update t1 set b=1;' end 'update t1 set c=1;' rollback;
executing "begin"...done
executing "begin"...done
executing "update t1 set a=1;"...done
executing "update t1 set b=1;"...done
executing "end"...done
executing "update t1 set c=1;"...done
executing "rollback"...done
1|1|1
Also note that you can't do this exact test by simply running the SQL statements in a batch from a GUI client such as pgAdmin III. That particular client seems to do some magic regarding transactions; it seems to enclose the batch in an implicit transaction, so that a rollback
statement will cause the previous statements to be rolled back (even though you also get a "NOTICE: there is no transaction in progress" message in the Messages pane...), except it also still somehow respects begin
...end
blocks (ignoring nested begin
statements as demonstrated above) within the batch, which confusingly seems awfully similar to autonomous transactions, which postgres does not support, as I believe we've established in this thread. So, for example, if you were to run the above 7 statements directly in pgAdmin III, the data you'd end up getting is 1|1|0
.
But regardless of that unintentional obfuscation, the incontrovertible conclusion is that postgres does not count nesting levels of begin
...end
blocks, so you must take care to only ever place yourself within one top-level begin
...end
block.
As the documentation says, a second BEGIN
does not affect the transaction state. That means it cannot cause the following COMMIT
to be ignored. So the first COMMIT
will indeed commit the transaction, and everything after that will proceed as you've described.
The simplest solution is to shift the responsibility for transaction control to the caller. Put your BEGIN
/ COMMIT
outside the call to doFooPlus()
, and you no longer need to worry about which subroutine is in charge of initiating the transaction. Even simpler, implement foo()
and fooPlus()
as server-side functions - which are inherently atomic - and forget about client-side transaction control altogether.
That said, the kind of ad-hoc transaction management pattern in your example can be useful. Ultimately, implementing this would just involve passing some additional information alongside the PGConn
pointer, and checking it around every BEGIN
/ COMMIT
/ ROLLBACK
.
Of course, this whole pattern is a little unwieldy; if you need to do it more than a couple of times, you'd want something to encapsulate the connection, the transaction control functions and the state (a simple depth counter would probably do it). But before setting out to write your own API, I'd take a look at what's already out there.