Search code examples
azure-cosmosdbspring-data-mongodbmongodb-javaazure-cosmosdb-mongoapi

CosmosDB - MongoDB API - 20 parallel transactions cause error (failed commit, transaction not active) using Java Spring @Transactional(readOnly=true)


We run into an odd problem when using Azure Cosmos DB with MongoDB API (4.0) and transactions. When we run exactly 20 parallel requests that all open a transaction that is read-only (no modifications), closing the transaction will throw this error:

Query failed with error code 2 and error message 'Error=2, Details='Response status code does not indicate success: BadRequest (400); Substatus: 1101; ActivityId: c48758d6-7c18-4973-9626-xxxxxxxxxxx; [...] (Message: {"Errors":["Transaction is not active"]}

The error consistently with 100% rate occurs with 20 parallel running request, 19 won't trigger this behaviour. The requests all act on different documents within the same collection, they don't touch the same data, and do not write any data. Cosmos DB is running in serverless mode, we are not constrained by RU/s in this scenario.

The client is a written with Java Spring Boot using the spring-boot-starter-data-mongodb. We applied @Transactional(readOnly=true) to certain repo impl functions to start a transaction and make sure the returned result set is always consistent. The implementation looks like:

@Override
@Transactional(readOnly = true)
public List<ZZZModel> findAllByXXXId(List<String> xxxIds, LocalDateTime from, LocalDateTime to) {
    validateDate(from, to);
    return this.zzzCrudRepository.findAllBetween(from, to, xxxIds).stream()
            .map(xxxxx).collect(Collectors.toList());
}

The error appears at org.springframework.data.mongodb.MongoTransactionManager.doCommit(MongoTransactionManager.java:203)

It looks like Spring's MongoDB transaction manager tries to commit the transaction even though it is readOnly, and fails because somehow the transaction never started. In this example case we only perform a single read operation in the transaction.

Is there a limit on Cosmos DB parallel transactions per collection, upon which some sort of retry logic needs to be applied?

(Note: In the above example we might get away without the transaction, but in others we need to perform a couple of reads, that all need to be consistent. This easy example was just provided to show that even such a simple request in 20 parallel TXs fails)


Solution

  • The issue here was that while the SQL API for Cosmos DB does index ALL columns by default, the MongoDB API for Cosmos DB does not.

    When a table has no indexes, and the transactions have to lookup data, all documents are scanned and thus it seems all documents are "locked" or "marked dirty". The limit of "20" is just random: it is the point in our application where n concurrent transactions that have to work on our DB cannot complete their work in the 5 seconds that is allotted for a transaction in Cosmos DB.

    Spring then tries to commit / abort a transaction and fails because Cosmos DB already has aborted it after 5 seconds. (This is a fixed time and cannot be altered).

    The fix is to properly create indexes in the MongoDB collections, or add $** as a wildcard index during development while the data model is still changing. Compound indexes are most of the time not necessary, except in cases like "unique indexes" or other edge cases.

    Please note that the documentation is a bit misleading. The MongoDB API's documentation here: https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/mongodb-indexing states You can also read an overview of indexing in Azure Cosmos DB that's relevant across all APIs.. The link forwards to https://learn.microsoft.com/en-us/azure/cosmos-db/index-overview which is the SQL API's index description that states: "By default, Azure Cosmos DB automatically indexes every property for all items in your container without having to define any schema or configure secondary indexes."