Search code examples
c#azureazure-cosmosdbazure-cosmosdb-sqlapi

How can I fetch documents in batches from CosmosDB and process them?


I am trying to fetch documents from CosmosDB and then do a foreach loop on the documents returned, I am doing it as follows

var productListFromHAPI = 
    await CosmosDb.GetProductDataFromHAPI(brand, deployedCountry,
         primaryLocale, secondaryLocale, _rawDataContainer, log);

var finalListOfObjects = new List<StorelensItemModel_V3>();

foreach (var storeVariantToInsert in productListFromHAPI) { processing here }

The problem is that GetProductDataFromHAPI returns millions of documents and the host no matter how large I make is running out of resources.

How can I split this up so that I can fetch and process 1000 documents at the time? I know I can use select top 1000 etc but how do I then know that the second round I am not fetching the same items again?

I tried to use offset and limit as well but I could not get it to work

Pagination does not seem to be a good fit for this use case.


Solution

  • It looks like you are using a wrapper GetProductDataFromHAPI that is calling the Cosmos SDK underneath.

    The Cosmos SDK FeedIterators allow you to paginate, consuming one page of results at a time:

    Reference: https://learn.microsoft.com/dotnet/api/microsoft.azure.cosmos.feediterator-1?view=azure-dotnet#examples

    using (FeedIterator<StorelensItemModel_V3> feedIterator = this.Container.GetItemQueryIterator<StorelensItemModel_V3>(
        "query"))
    {
        while (feedIterator.HasMoreResults)
        {
            FeedResponse<StorelensItemModel_V3> response = await feedIterator.ReadNextAsync();
            
            // You can yield the response for an upper layer to be consumed and pass the ContinuationToken to use the next time you want to continue the query
        }
    }
    

    As a side note, keep in mind the performance tips when constructing and executing queries: https://learn.microsoft.com/azure/cosmos-db/sql/performance-tips-query-sdk?tabs=v3&pivots=programming-language-csharp