Search code examples
pythonpandaspyodbcbulkinsertexecutemany

Dataframe to SQL Server using Execute many from pyodbc


I am trying to load data from dataframe to SQL Server using Pyodbc which inserts row by row and its very slow.

I have tried 2 approaches as found online(medium) and I don't find any improvement in performance.

Trying to run in SQL azure so SQL Alchemy is not an easy connection method. please find the approaches which I followed and is there any other way to improve the performance of bulk Load.

Method 1

 cursor = sql_con.cursor()
cursor.fast_executemany = True
for row_count in range(0, df.shape[0]):
  chunk = df.iloc[row_count:row_count + 1,:].values.tolist()
  tuple_of_tuples = tuple(tuple(x) for x in chunk)
  for index,row in ProductInventory.iterrows():
  cursor.executemany("INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples)

Method 2

 cursor = sql_con.cursor() 
for row_count in range(0, ProductInventory.shape[0]):
      chunk = ProductInventory.iloc[row_count:row_count + 1,:].values.tolist()
      tuple_of_tuples = tuple(tuple(x) for x in chunk)
  for index,row in ProductInventory.iterrows():
    cursor.executemany(""INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples 

Can anyone tell me why the performance is not improved even by 1%? It still takes the same time


Solution

  • Trying to run in SQL azure so SQL Alchemy is not an easy connection method.

    Perhaps you just need to get over that hurdle first. Then you can use pandas to_sql along with fast_executemany=True. For example

    from sqlalchemy import create_engine
    #
    # ...
    #
    engine = create_engine(connection_uri, fast_executemany=True)
    df.to_sql("table_name", engine, if_exists="append", index=False)
    

    If you have a working pyodbc connection string you can convert it to a SQLAlchemy connection URI like so:

    connection_uri = 'mssql+pyodbc:///?odbc_connect=' + urllib.parse.quote_plus(connection_string)