Search code examples
c#azureazure-cosmosdbazure-cosmosdb-changefeed

Is it possible to monitor Change Feed work perfectly in Azure Function from CosmosDB


I have an Azure function which will trigger Change Feed from CosmosDB then doing ETL its document to MS SQL. But when the server (either CosmosDB or Azure Function host) is somehow stressful, the feed would be late for few seconds to minutes even lost. Currently I have workaround to prevent lost by Data Factory and re-sync it day by day.

Now I want to implement the measurement to monitor, such as delay time, success or fail to point out and determine whether should I scale DTU up, or something serving analysis. I cannot query all both Cosmos and SQL to compare among them, although it is the last way to detect lost by simply query count.

Is this possible?


Solution

  • For health monitoring, you can enable the health monitor checks to be sent to your App Insights: https://learn.microsoft.com/azure/cosmos-db/how-to-configure-cosmos-db-trigger-logs

    Enabling logging through:

    {
      "version": "2.0",
      "logging": {
        "fileLoggingMode": "always",
        "logLevel": {
          "Host.Triggers.CosmosDB": "Trace"
        }
      }
    }
    

    This would catch any critical errors happening inside the Trigger while trying to perform checkpoints (for example, the lease collection has been deleted).

    As for delays in getting changes, the most common reasons are detailed here: https://learn.microsoft.com/azure/cosmos-db/troubleshoot-changefeed-functions#my-changes-take-too-long-to-be-received

    Keep in mind that in most cases, the new batches of changes are read after your current execution finished processing the current batch. Normally if you follow the Functions best practices and the Functions are slim, they wouldn't pose an issue, but if you see your Function taking a long time to process changes (something not linear, for example, it processes 10 events in 1 second but 50 events in 30 seconds, which is a metric you see in App Insights per Function execution), it might point at an undesired complexity on the Function's code.

    There is a way also to wire the Change Feed Estimator on another Function which basically exposes how far behind you are lagging in the Change Feed: https://medium.com/microsoftazure/azure-cosmos-db-functions-cookbook-monitoring-trigger-pending-work-800b24589235

    The idea is that you can use a TimerTrigger and mix it with the current Trigger configuration to create an Estimator and consume it (see post for full description):

    [FunctionName("Monitor")]
    public static async Task Monitor(
        [TimerTrigger("*/1 * * * * *", RunOnStartup = true)] TimerInfo timer, // Timer will trigger every 1 second, adjust CRON expression
        [CosmosDB("%MonitoredDatabase%", "%MonitoredCollection%", ConnectionStringSetting = "CosmosDB")] DocumentClient monitoredCollectionClient, 
        [CosmosDB("%MonitoredDatabase%", "leases", ConnectionStringSetting = "CosmosDB")] DocumentClient leaseCollectionClient,
        ILogger log)
    {
        var estimator = GetRemainingWorkEstimator(monitoredCollectionClient, leaseCollectionClient);
        var remainingWork = await estimator.GetEstimatedRemainingWork();
        // Send custom metric to App Insight
        log.LogInformation(remainingWork.ToString());
        log.LogMetric("RemainingWork", remainingWork);
    }