Search code examples
sqlsql-serverpandassqlalchemyairflow

SQL Deadlock on sys.tables within pandas > to_sql > check_case_sensitive


I have a problem, where I see deadlocks on my Azure SQL Database. These occur since we changed the code from a virtual machine into Airflow. Which means that the most obvious changes are:

  • Windows Server 2022 Datacenter Azure Edition to Kubernetes 1.25.6
  • Version changes of SQLAlchemy and/or pandas.

In df.to_sql, it is always an if_exists='replace', then it always crashes on check_case_sensitive where it deadlocks on sys.tables. If it was on an actual data table I would understand it is something like a transaction commitment issue, if I checked the sys.tables myself I would maybe understand it, but the error is raised deep within the module. The case check is just a check to raise a warning "The provided table name '{name}' is not found exactly as such in the database after writing the table, possibly due to case sensitivity issues. Consider using lower case table names." without even fixing it.

select [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
from [INFORMATION_SCHEMA].[TABLES]
where [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST('dbo' as nvarchar(max))
    and [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST('BASE TABLE' as nvarchar(max))
order by [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]

We have no full transaction log on the SQL DB. What possibilities do I have to find the problem? I'm lost for ideas.

Traceback:

Traceback (most recent call last):
  File "/opt/airflow/dags/repo/dags/xyz.py", line 308, in xyzabc
    df.to_sql(tablename, con=engine, if_exists=if_exists, dtype=datatypedict, index=False, chunksize=10000)
  File "/home/airflow/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 2878, in to_sql
    return sql.to_sql(
  File "/home/airflow/.local/lib/python3.10/site-packages/pandas/io/sql.py", line 769, in to_sql
    return pandas_sql.to_sql(
  File "/home/airflow/.local/lib/python3.10/site-packages/pandas/io/sql.py", line 1932, in to_sql
    self.check_case_sensitive(name=name, schema=schema)
  File "/home/airflow/.local/lib/python3.10/site-packages/pandas/io/sql.py", line 1832, in check_case_sensitive
    table_names = insp.get_table_names(schema=schema or self.meta.schema)
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 266, in get_table_names
    return self.dialect.get_table_names(
  File "<string>", line 2, in get_table_names
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 55, in cache
    ret = fn(self, con, *args, **kw)
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/base.py", line 2613, in wrap
    return _switch_db(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/base.py", line 2655, in _switch_db
    return fn(*arg, **kw)
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/base.py", line 3065, in get_table_names
    table_names = [r[0] for r in connection.execute(s)]
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1385, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1948, in _execute_context
    self._handle_dbapi_exception(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2129, in _handle_dbapi_exception
    util.raise_(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1905, in _execute_context
    self.dialect.do_execute(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('40001', '[40001] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Transaction (Process ID 418) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205) (SQLExecDirectW)')
[SQL: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] 
FROM [INFORMATION_SCHEMA].[TABLES] 
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]]
[parameters: ('dbo', 'BASE TABLE')]
(Background on this error at: https://sqlalche.me/e/14/dbapi)


Solution

  • As Martin Smith pointed out in the comments:

    In Azure you can get the deadlock graphs from querying the master database as here Lesson Learned #19: How to obtain the deadlocks of your Azure SQL Database or Managed Instance?. You should do that to see the exact cause

    Doing that, we found out that it deadlocked on completely different processes working on the same database, running on Airflow or not, and we probably didn't notice before. It didn't deadlock on itself, the previous query and/or multithreading.

    Several options are:

    • There was an external cause that wasn't dependent on the migration (e.g. database pool busy)
    • We were focussing more on reading the logs during the migration, thus finding out about it where it previously just wasn't seen
    • Airflow has a more precise startup time, where on Windows the processes are pushed back if you start 10 processes at once, leading to more overlap on Airflow.

    We have not found the exact cause, but I am fairly sure it wasn't a specific Pandas issue. The answer for someone else is: check the deadlocks, it probably is not what you think it is. It is most probably not SQLAlchemy or Pandas deadlocking itself. On Azure Elastic Pool, check your Pool usage. On SQL Server, check your resources.

    You do not need Azure SQL Auditing or other optional transaction-style monitoring, it is quite a basic SQL Server system function.