I've migrated my application using Azure CosmosDB SDK to v3 (version 3.43.1
) and I'm using the bulk functionality to upload somewhere between 200 and 1000 items.
Using the BulkExecutor and its configuration I was getting 100% success rate when uploading-- all documents were saved.
In V3 I'm getting 429 response, and only some documents are saved. I'm not sure how I should handle those that were not saved.
ClientOptions
setup:
var options = new CosmosDbOptions
{
AllowBulkExecution = true,
MaxRetryWaitTimeOnRateLimitedRequests = TimeSpan.FromSeconds(60),
MaxRetryAttemptsOnRateLimitedRequests = 19,
}
I'm using the BulkOperations wrapper as described in the docs.
My main method for bulk uploading looks like this:
public async Task<BulkOperationResponse<TDocument>> ImportAsync(IEnumerable<TDocument> documents)
{
var bulkOperations = new BulkOperations<TDocument>(documents.Count());
foreach (var document in documents)
{
bulkOperations.Tasks.Add(CaptureOperationResponse(_container.Value.CreateItemAsync(document, new PartitionKey(document.PartitionKey)), document));
}
var response = await bulkOperations.ExecuteAsync();
return response;
}
After inspecting the BulkOperationResponse<T>
I often see that only a chunk of documents, were saved.
I have the same issue when trying to bulk delete documents via a method:
public async Task<BulkOperationResponse<TDocument>> DeleteAsync(IEnumerable<TDocument> documents)
{
var bulkOperations = new BulkOperations<TDocument>(documents.Count());
foreach (var document in documents)
{
bulkOperations.Tasks.Add(CaptureOperationResponse(_container.Value.DeleteItemAsync<TDocument>(document.Id, new PartitionKey(document.PartitionKey)), document));
}
var response = await bulkOperations.ExecuteAsync();
return response;
}
I'm using a shared throughput for 4 containers - 400 RU/s.
How do I need to retry uploading of the failed documents? Does it need to be handled by the SDK, or should I retry in my code?
Thanks for your comments and answers. I analyzed how the code behaves in previous version vs. V3
I noticed that upserting 100 documents in BulkExecutor
took over 2 seconds and consumed over 2 000 RU.
In V3
it took 7 seconds, consumed almost 2 000 RU, while some of the documents failed.
So I decided to upscale before my code to a significantly larger throughput and downscale after the operation finishes. This gave me the desired results.
I did some reading up and there are more ways to handle it, such as:
But due to my setup, I decided to stick with manually increasing the throughput for the time of my operation.