Search code examples
sqloracleoracle12cdatabase-deadlocks

Oracle 12c: Parallel SELECT FOR UPDATE NOWAIT on disjoint sets of rows causes ORA-00054: resource busy


I am running following query in several concurrent processes:

SELECT A_ID, B_ID, C_ID, C_STATUS, D_ID
FROM A NATURAL JOIN B NATURAL JOIN C 
WHERE A_ID IN (...)
FOR UPDATE OF C_STATUS, D_ID NOWAIT;
  • Table A: A_ID (PK)
  • Table B: B_ID (PK), A_ID (FK)
  • Table C: C_ID (PK), C_STATUS, B_ID (FK), D_ID (FK)
  • Table D: D_ID (PK)

Each process has its own set of values in the IN (...) list and these sets are guaranteed to be disjoint.

Not sure if it matters, but the FK to table D is also disjoint -- both before and after update which is done later on.

I am however occasionally getting ORA-00054: resource busy, which I read as "two processes tried to lock the same row for update in a NOWAIT statement".

Before I had the FOR UPDATE OF C_STATUS, D_ID NOWAIT clause there the parallel queries were waiting for other to finish (waiting for release of the lock) and rarely I was getting deadlocks later during attempt to update the respective rows of table C:

Deadlock graph:

                        ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name           process session holds waits  process session holds waits
TX-000F001F-0000F3B5-.. 39    1414     X             75     835           S
TX-0009000B-000124A5-.. 75     835     X             39    1414           S

Locked rows are from table C. However when checking debug log and rowid on blocking rows I find that at least one of the processes was not supposed to touch the row.

Any idea why am I getting resource busy / deadlock when updating disjoint rows by multiple processes? Why is Oracle locking rows which are not actually being used??

EDIT: I was able to narrow the issue down to this bash script:

#!/bin/bash
sub(){
sqlplus -S "$DB_ACCESS" << EOF
exec dbms_lock.sleep($2);
select '$1:'||C_ID from C where C_ID in ($3)
for update nowait;
exec dbms_lock.sleep(2);
rollback;
EOF
}
sub 1 0.1 1510223
sub 2 0.3 1510600
sub 3 0.5 1512100
wait

You can see that C_IDs are different, and I checked that parent B_ID and grandparent A_ID are different as well for all three.

And I am getting following output:

PL/SQL procedure successfully completed.
'1:'||C_ID
------------------------------------------
1:1510223
PL/SQL procedure successfully completed.
Rollback complete.
PL/SQL procedure successfully completed.
'2:'||C_ID
------------------------------------------
2:1510600
PL/SQL procedure successfully completed.
Rollback complete.
PL/SQL procedure successfully completed.
select '3:'||C_ID from C where C_ID in (1512100)
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
PL/SQL procedure successfully completed.
Rollback complete.

I wish I could provide sample data but I am only able to reproduce this issue after loading ~1000000 rows into table C.


Solution

  • OK, I found the root cause. It was an ITL lock & wait.

    More info here: ITL waits demystified

    The trick was that the three rows were all part of the same physical block which was almost full and locks for these rows are also stored in the physical block. There was not enough space for the third lock and therefore the third transaction would wait.