Search code examples
pythonsql-servertransactionsrollback

Potential problems rolling back multiple-line SQL Transaction


I need to insert a CSV file into a table on SQL Server using Python (BULK INSERT is turned off). Instead of using SQLAlchemy I'm writing my own function (may God forgive me). I'm creating lists of SQL code as strings

sql_code_list = ["insert into table_name values (1,'aa'),(2,'ab'),(3,'ac')...(100,'az')", 
                 "insert into table_name values (101,'ba'),(102,'bb'),(103,'bc')...(200,'bz')"]

and I plan to run them in the DB using pyodbc package one by one. To ensure data integrity, I want to use BEGIN TRANS ... ROLLBACK / COMMIT TRANS ... syntaxis. So I want to send command

DECLARE @TransactionName varchar(20) = 'TransInsert'
BEGIN TRANS @TransactionName

then send all my ```INSERT`` statements, and send on success

DECLARE @TransactionName varchar(20) = 'TransInsert'
COMMIT TRANS @TransactionName

or on failure

DECLARE @TransactionName varchar(20) = 'TransInsert'
ROLLBACK TRANS @TransactionName

There will be many INSERT statements, let's say 10,000 statements each inserting 100 rows, and I plan to send them from the same connection.cursor object but in multiple batches. Does this overall look like a correct procedure? What problems may I run into when I send these commands from a Python application?


Solution

  • There is no need for a named transaction here.

    You could submit a transactional batch of multiple statements like this to conditionally rollback and throw on error:

    SET XACT_ABORT, NO_COUNT ON;
    BEGIN TRY
        BEGIN TRAN;
        <insert-statements-here>;
        COMMIT;
    END TRY
    BEGIN CATCH
        IF @@TRANCOUNT > 0 ROLLBACK;
        THROW;
    END CATCH;
    

    The maximum SQL Server batch size is 64K * and the default network packet size is 4K, so each batch may be up to 256MB by default. 10K inserts will likely fit within that limit so you could try sending all in a single batch and break it into multiple smaller batches only if needed.

    An alternative method to insert multiple rows is with an INSERT...SELECT from a table-valued parameter source. See this answer for an example of passing TVP value. I would expect much better performance with that technique because it avoids parsing a large batch and SQL Server internally bulk-inserts TVP data into tempdb.