Search code examples
azureazure-functionsazure-cosmosdbazure-cosmosdb-changefeed

CosmosDB lease collection is no longer being created automatically


I'm having a very strange problem with CosmosDB & Azure Functions. I frequently delete my database and re-create it in DEV. I then re-deploy the function app. When I call the APIs in the app and CosmosDB triggers are invoked, I normally see the leases collection created. Here's a typical trigger:

[FunctionName("MyTrigger")]
public static async Task RunAsync([CosmosDBTrigger("MyDatabase", "MyContainer",
ConnectionStringSetting = "CosmosConnectionString", LeaseCollectionName = "leases", 
LeaseCollectionPrefix = "MyTrigger", CreateLeaseCollectionIfNotExists = true)]IReadOnlyList<Document> documents, 
ExecutionContext executionContext)
{
     // code
}

For some reason, the leases collection is no longer being created. I re-created the database, re-deployed the function app multiple times and made API calls with no luck. What am I missing?

EDIT: I looked at the logs and noticed there are a lot of Microsoft.Azure.Documents.ChangeFeedProcessor.Exceptions.LeaseLostException exceptions with The lease was lost message, so I'm not sure what's going on.

EDIT2: Here's a more detailed error message I was able to extract from the logs:

"Either the source collection 'MyContainer' (in database 'MyDatabase') or the lease collection 'leases' (in database 'MyDatabase') does not exist. Both collections must exist before the listener starts. To automatically create the lease collection, set 'CreateLeaseCollectionIfNotExists' to 'true'

Note that CreateLeaseCollectionIfNotExists is already set to true.


Solution

  • Either the source collection... error comes from here: https://github.com/Azure/azure-webjobs-sdk-extensions/blob/0683d1bd08a16680c70f982ad00c940b7e9c1fce/src/WebJobs.Extensions.CosmosDB/Trigger/CosmosDBTriggerListener.cs#L140 which reacts on a NotFound being detected while trying to start the Trigger process.

    The key here is understanding that the Lease Collection creation happens during Function initialization, not if the Function is running.

    If you delete the lease collection (or the monitored collection) while the Function is running, you might see that error pop, produced by the running instances. If a new instance comes up (due to scaling) or you restart the Function, then the creation kicks in in https://github.com/Azure/azure-webjobs-sdk-extensions/blob/0683d1bd08a16680c70f982ad00c940b7e9c1fce/src/WebJobs.Extensions.CosmosDB/Trigger/CosmosDBTriggerAttributeBindingProvider.cs#L155.

    So, when do these errors happen?

    1. Function initialization -> CreateIfNotExist checks and creates Leases collection. If this fails, then initialization stops here. This produces an error message.
    2. Function running -> Instances can be running and if the lease is deleted runtime errors will make the Function code to retry to Start the process again, since the retry does not run the initialization again, it outputs the Either the source collection...
    3. Occasional The lease was lost occurs in load balancing scenarios where multiple Function instances are running and distributing scaled load when a lease (from the lease collection) is distributed to a new instance. This can also happen if the Trigger tried to update the checkpoint and you suddenly deleted the lease collection.

    What you can do

    If you are manually deleting the leases collection, then you are in control of what can happen. The recommendation is:

    1. stop your Functions
    2. Delete the leases collection
    3. Start your Functions.

    The behavior of the Function if you don't stop it and if you delete the lease store while it's running is totally undefined.