I have a function to upload a dataframe to a SQL server, defined as follows:
def uploadtoDatabase(data,columns,tablename):
connect_string = urllib.parse.quote_plus(f'DRIVER={{ODBC Driver 17 for SQL Server}};Server=ServerName;Database=DatabaseName;Encrypt=yes;Trusted_Connection=yes;TrustServerCertificate=yes')
engine = sqlalchemy.create_engine(f'mssql+pyodbc:///?odbc_connect={connect_string}', fast_executemany=False)
# define data to upload and chunksize (max 2100 parameters)
requireddata = columns
chunksize = 1000//len(requireddata)
#convert boolean columns
boolcolumns=data.select_dtypes(include=['bool']).columns
data.loc[:,boolcolumns] = data[boolcolumns].astype(int)
#convert objects to string
objectcolumns=data.select_dtypes(include=['object']).columns
data.loc[:,objectcolumns] = data[objectcolumns].astype(str)
#load data
with engine.connect() as connection:
data[requireddata].to_sql(tablename, connection, index=False, if_exists='replace',schema = 'dbo')
connection.close()
engine.dispose()
I have tried different options but every time I execute this insert (4M records, which takes a while), the database is locked. Other processes are waiting on the Python process to finish.
Is there any way to ensure other database transactions continu to execute while this function is running, other than creating a for loop and execute the function multiple times on smaller batches of the data?
even expanding the table list is frozen in SSMS untill I kill the python proces
You're using if_exists='replace'
so you've dropped and recreated the table in the transaction. This gives you an exclusive metadata lock for the duration.
If you want to do this concurrently, create all the tables ahead-of-time, at design time, or at runtime in a separate transaction. Then append
the data.