Search code examples
pythonsql-serversqlalchemy

database locked when using sqlalchemy to_sql method


I have a function to upload a dataframe to a SQL server, defined as follows:


def uploadtoDatabase(data,columns,tablename):

    connect_string = urllib.parse.quote_plus(f'DRIVER={{ODBC Driver 17 for SQL Server}};Server=ServerName;Database=DatabaseName;Encrypt=yes;Trusted_Connection=yes;TrustServerCertificate=yes')
    engine = sqlalchemy.create_engine(f'mssql+pyodbc:///?odbc_connect={connect_string}', fast_executemany=False)

    # define data to upload and chunksize (max 2100 parameters)
    requireddata = columns
    chunksize = 1000//len(requireddata)
    
    #convert boolean columns
    boolcolumns=data.select_dtypes(include=['bool']).columns
    data.loc[:,boolcolumns] = data[boolcolumns].astype(int)
    
    #convert objects to string
    objectcolumns=data.select_dtypes(include=['object']).columns
    data.loc[:,objectcolumns] = data[objectcolumns].astype(str)

    #load data
    with engine.connect() as connection:
      data[requireddata].to_sql(tablename, connection, index=False, if_exists='replace',schema = 'dbo')
      connection.close()

    engine.dispose()

I have tried different options but every time I execute this insert (4M records, which takes a while), the database is locked. Other processes are waiting on the Python process to finish.

Is there any way to ensure other database transactions continu to execute while this function is running, other than creating a for loop and execute the function multiple times on smaller batches of the data?


Solution

  • even expanding the table list is frozen in SSMS untill I kill the python proces

    You're using if_exists='replace' so you've dropped and recreated the table in the transaction. This gives you an exclusive metadata lock for the duration.

    If you want to do this concurrently, create all the tables ahead-of-time, at design time, or at runtime in a separate transaction. Then append the data.