I need to insert a CSV file into a table on SQL Server using Python (BULK INSERT
is turned off). Instead of using SQLAlchemy
I'm writing my own function (may God forgive me). I'm creating lists of SQL code as strings
sql_code_list = ["insert into table_name values (1,'aa'),(2,'ab'),(3,'ac')...(100,'az')",
"insert into table_name values (101,'ba'),(102,'bb'),(103,'bc')...(200,'bz')"]
and I plan to run them in the DB using pyodbc
package one by one. To ensure data integrity, I want to use BEGIN TRANS ... ROLLBACK / COMMIT TRANS ...
syntaxis. So I want to send command
DECLARE @TransactionName varchar(20) = 'TransInsert'
BEGIN TRANS @TransactionName
then send all my ```INSERT`` statements, and send on success
DECLARE @TransactionName varchar(20) = 'TransInsert'
COMMIT TRANS @TransactionName
or on failure
DECLARE @TransactionName varchar(20) = 'TransInsert'
ROLLBACK TRANS @TransactionName
There will be many INSERT statements, let's say 10,000 statements each inserting 100 rows, and I plan to send them from the same connection.cursor
object but in multiple batches. Does this overall look like a correct procedure? What problems may I run into when I send these commands from a Python application?
There is no need for a named transaction here.
You could submit a transactional batch of multiple statements like this to conditionally rollback and throw on error:
SET XACT_ABORT, NO_COUNT ON;
BEGIN TRY
BEGIN TRAN;
<insert-statements-here>;
COMMIT;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0 ROLLBACK;
THROW;
END CATCH;
The maximum SQL Server batch size is 64K * and the default network packet size is 4K, so each batch may be up to 256MB by default. 10K inserts will likely fit within that limit so you could try sending all in a single batch and break it into multiple smaller batches only if needed.
An alternative method to insert multiple rows is with an INSERT...SELECT
from a table-valued parameter source. See this answer for an example of passing TVP value. I would expect much better performance with that technique because it avoids parsing a large batch and SQL Server internally bulk-inserts TVP data into tempdb.