Search code examples
amazon-redshiftdbt

DBT getting deadlock errors on build runs for random models (Redshift)


We have a couple of environment builds scheduled in dbt. Each environment has several models (10-15).

Each time we run the build, either as scheduled or manually, several of the models do not complete. The models which fail are seemingly random each time.

The errors are always along these lines:

Database Error in model my_model (models/env1/my_model.sql)
  deadlock detected
  DETAIL:  Process 123 waits for ShareLock on transaction 456; blocked by process 789.
  Process 789 waits for AccessShareLock on relation 123456 of database 654321; blocked by process 123.
  compiled SQL at target/run/my_company/models/env1/my_model.sql

We are running DBT Cloud against our AWS Redshift cluster. Our Redshift cluster is made up of four ra3.xlplus machines.

Our schedules have been tried both incrementally and non-incrementally (rebuilt each run). Table sizes vary from 7MB to 285GB. Schedules have been tried overnight and every 4 hours or so. No noticeable differences from these changes.

The model queries are varied but all quite simplistic. We have not noticed any correlation here either, and the same models will run sometimes and deadlock on others without any changes.


Solution

  • You have multiple connections being made to Redshift and there are table dependencies between them (a circular dependency loop - process 1 is updating table A from table B while at the same time process 2 is updating table B based on table A, for example). This issue only shows up when the timing of execution lines up a certain way. Since the issue is fairly repeatable it is likely that the sessions in question are both from DBT but not for certain if there are other tools in flight at these times.

    I think you need to look up the 2 processes (sessions / connections) in question and see what user it is and what actions they are performing. Then look up the tables and transaction in question. Line up the timestamps and see what the dependency is and understand why it is happening. With this information you can develop a plan for ensuring that this situation (circular dependency loop) cannot happen again for this table.