Search code examples
sql-serverbulkinsertpyodbc

pyodbc - very slow bulk insert speed


With this table:

CREATE TABLE test_insert (
    col1 INT,
    col2 VARCHAR(10),
    col3 DATE
)

the following code takes 40 seconds to run:

import pyodbc

from datetime import date


conn = pyodbc.connect('DRIVER={SQL Server Native Client 10.0};'
    'SERVER=localhost;DATABASE=test;UID=xxx;PWD=yyy')

rows = []
row = [1, 'abc', date.today()]
for i in range(10000):
    rows.append(row)

cursor = conn.cursor()
cursor.executemany('INSERT INTO test_insert VALUES (?, ?, ?)', rows)

conn.commit()

The equivalent code with psycopg2 only takes 3 seconds. I don't think mssql is that much slower than postgresql. Any idea on how to improve the bulk insert speed when using pyodbc?

EDIT: Add some notes following ghoerz's discovery

In pyodbc, the flow of executemany is:

  • prepare statement
  • loop for each set of parameters
    • bind the set of parameters
    • execute

In ceODBC, the flow of executemany is:

  • prepare statement
  • bind all parameters
  • execute

Solution

  • I was having a similar issue with pyODBC inserting into a SQL Server 2008 DB using executemany(). When I ran a profiler trace on the SQL side, pyODBC was creating a connection, preparing the parametrized insert statement, and executing it for one row. Then it would unprepare the statement, and close the connection. It then repeated this process for each row.

    I wasn't able to find any solution in pyODBC that didn't do this. I ended up switching to ceODBC for connecting to SQL Server, and it used the parametrized statements correctly.