Looks as if Oracle FOR UPDATE applies irrelevant locks

I have a piece of java code that deals with JPA repository and uses plain one-table nativeQuery select. This is the repo:

@Repository
public interface MyTableRepository extends JpaRepository<MyTableEntity, Long> {

    @Query(value = """
        SELECT rowid, field1, field2, status
        FROM my_table 
        WHERE field2 = :value AND ROWNUM <= 10000
        FOR UPDATE OF STATUS SKIP LOCKED
        """,
        nativeQuery = true)
    List<MyTableEntity> getEntities(String value);

}

and entity:

@Data
@Entity
public class MyTableEntity {

    private String rowid;

    @Id
    private Long field1;
    private String field2;
    private String status;

}

my DAO method:

    @Transactional
    public List<MyTableEntity> getTransactions(String value) {
        List<MyTableEntity> entities= myTableRepository.getEntity(value);
        log.info("Count: {}", entities.size());
        return entities;
    }

MY_TABLE's structure is this:

  CREATE TABLE MY_TABLE
   (    "FIELD1" NUMBER(15,0), 
    "FIELD2" CHAR(6 BYTE), 
    "STATUS" VARCHAR2(20 BYTE)
   );
  CREATE UNIQUE INDEX MY_TABLE_IDX1 ON MY_TABLE ("FIELD1");
  ALTER TABLE MY_TABLE MODIFY ("FIELD1" NOT NULL ENABLE);

contains these quantities of records for different field2 value

SELECT count(1), field2 FROM my_table group by field2 order by 2

30424   value1
21171   value2
10890   value3
16690   value4
11198   value5

I expect that if the repo method getEntity is invoked it will respond with a list of size 10000 (AND ROWNUM <= 10000) in any case - had it been invoked in a single thread or in 5 parallel threads each for one of the field2 possible values.

All is fine and predictable in a single-thread case - run for a field2 value value1:

2023-11-06 13:14:48,198 INFO  [     test-value1-1] [InputStorage.java:58]   Count: 10000

When it's run in two parallel threads for field2 values value1,value2 it's still OK:

2023-11-06 13:13:23,505 INFO  [     test-value2-1] [InputStorage.java:58]   Count: 10000
2023-11-06 13:13:23,505 INFO  [     test-value1-1] [InputStorage.java:58]   Count: 10000

It's still OK for three parallel threads (for field2 values value1,value2,value3). The disaster happens when I run it in 4 parallel threads for field2 values value1,value2,value3,value4:

2023-11-06 13:16:20,961 INFO  [     test-value2-1] [InputStorage.java:58]   Count: 9857
2023-11-06 13:16:20,962 INFO  [     test-value4-1] [InputStorage.java:58]   Count: 10000
2023-11-06 13:16:20,962 INFO  [     test-value3-1] [InputStorage.java:58]   Count: 10000
2023-11-06 13:16:21,181 INFO  [     test-value1-1] [InputStorage.java:58]   Count: 9973

and it gets yet more disastrous being run in 5 threads for value1,value2,value3,value4,value5:

2023-11-06 13:20:54,355 INFO  [     test-value1-1] [InputStorage.java:58]   Count: 9401
2023-11-06 13:20:54,355 INFO  [     test-value2-1] [InputStorage.java:58]   Count: 9542
2023-11-06 13:20:54,497 INFO  [     test-value4-1] [InputStorage.java:58]   Count: 9976
2023-11-06 13:20:54,505 INFO  [     test-value5-1] [InputStorage.java:58]   Count: 9909
2023-11-06 13:20:54,552 INFO  [     test-value3-1] [InputStorage.java:58]   Count: 10000

Of course if I remove FOR UPDATE OF STATUS SKIP LOCKED from the sql statement in repository , all is fine in all (single or multi-threaded) cases.

The question is: how results of pure independent queries (independent as they contain strictly orthogonal conditions) be dependent in such a strange and unstable way? How do locks applied by one select with some query condition influence (by locking?) selectability of data of another select with quite different conditions?

I use @Transactional here just for completeness as all this is a simplification of my applicative code where I do updates for selected entities.

I use Oracle Database 23c Free, Release 23.0.0.0.0 - Developer-Release Version 23.2.0.0.0

Solution

The documentation says:

Oracle Database uses a queuing mechanism for acquisition of row locks. If a transaction requires a row lock, and if the row is not already locked, then the transaction acquires a lock in the row's data block. The transaction itself has an entry in the interested transaction list (ITL) section of the block header. Each row modified by this transaction points to a copy of the transaction ID stored in the ITL. Thus, 100 rows in the same block modified by a single transaction require 100 row locks, but all 100 rows reference a single transaction ID.

and

When a transaction ends, the transaction ID remains in the ITL section of the data block header. If a new transaction wants to modify a row, then it uses the transaction ID to determine whether the lock is active.

As described in this old blog entry, if the interested transaction list (ITL) is full:

When we started multiple, concurrent consumer programs, one wait event was immediately obvious and overwhelming:

enq: TX - allocate ITL entry

So, clearly, the root cause of all my problems and headaches is a shortage of ITL entries in the header of the data block. So, the blocking is happening at the block-level, not the row-level. This explains why locks are not being taken on rows that are not locked, even when the cursor has not yet identified 100 candidate rows.

I'm not sure that's exactly what's happening here, but...

Since INITRANS defaults to 1, if you have multiple rows in the same block with different values, the first row selected for update for the first value will take that ITL slot; subsequent rows for that value will be in the same transaction so will share that ITL slot; but a row in the same block selected by a different transaction, i.e. with a different value, will see the block as locked, and will silently skip that row because of the SKIP LOCKED clause - it's skipping all TX locks not just the row TX locks.

Or as that blog concludes:

However, it’s important to understand that the skip locked clause means that Oracle will skip any time a TX lock is encountered, including TX locks that are happening at the block-level. The problem is, if you’re encountering non-row level locking, skip locked is still going to skip over the locked resource and move on. No error will be reported! This may lead to unpredictable results. The moral of the story is that skip locked doesn’t necessarily mean skip locked rows, it means skip any locked resource.

That gives, as the fix for their issue, the suggestion to alter table ... move nologging initrans 110. You don't need to switch to nologging, you can just do:

alter table my_table move initrans 5

or with a higher number to handle more concurrent transactions.