Search code examples
javacamunda

Camunda DeploymentCache management


We're experiencing problem with the memory of the Spring Boot application. The heap dump shows that the main part of it is consumed by camunda component org.camunda.bpm.engine.impl.persistence.deploy.cache.DeploymentCache - 69, 47%. In deploymentCache the dominant object is org.camunda.bpm.engine.impl.persistence.deploy.cache.BpmnModelInstanceCache - 69, 03%

Turns out Camunda loads all deployed process definitions, case definitions, decision definitions from the DB tables at the start. And we have tens of thousands process definitions. To fix the memory problem we're using the method org.camunda.bpm.engine.RepositoryService#deleteDeployment(java.lang.String, boolean) Which deletes the data from camunda system tables in DB.

        List<Deployment> oldDeployments = repositoryService.createDeploymentQuery()
                .deploymentBefore(date)
                .listPage(0, maxResult);
        boolean cascade = true;

        for (Deployment deployment : oldDeployments) {
            repositoryService.deleteDeployment(deployment.getId(), cascade);
        }

But, using this approach we don't have much control over the deletion process. And that's important, because we have some processes with heavy payment logic. If the incident occurs, the time required for the process can take much more than expected. And we can't delete those processes, but it's rather an exclusion

So, is there another way to do it, add more control over deletion process/deploymentCache? Like:

  • Adding handler. Which is querying additional data from the tables with business data, to decide if it should be deleted.
  • Adding to exclusion certain schema names.
  • Customizing DeploymentCache to load on startup only the deployments for the specified period and the others on demand. Or specifying the deploymentCache limit

Camunda Version: 7.13.0


Solution

  • I would not focus on the symptom e.g.g by addressing it on the deployment cache level, but try to get the environments cleaned up (regularly) and avoid the root cause in the future.

    Establish a cleanup strategy

    • get an idea of what is already cleanable using the report.
    • ensure the historyTimeToLive (ttl) is set on your process definitions via Cockpit or API
    • if required set the ttl on the process definition already in the runtime via Cockpit or API
    • if required set removal time on historic instances e.g. to an absolute value in the past
    • run cleanup for historic instances you want to remove
    • consider using process instance migration to move instances from deployments with low instance numbers to newer versions

    As a result you should end up with many deployments having 0 process instances (they have been cleaned, also form history, or migrated if running)

    Delete deployments having 0 process instances from DB

    1. Get the list of deployments: https://docs.camunda.org/manual/7.18/reference/rest/deployment/get-query/
    2. Count the process instances a deployment has: https://docs.camunda.org/manual/7.18/reference/rest/process-instance/get-query-count/
    3. Delete a deployment: https://docs.camunda.org/manual/7.18/reference/rest/deployment/delete-deployment/

    Steps 1-3 are also available via the deployment screen in Camunda Cockpit.

    Identify the root cause

    Hundreds - ok. Thousands - maybe. Tens of thousands seem a magnitude too high. Are you possibly deploying duplicates when there have been no model changes? Are you maybe creating a deployment for every single model instead of bundling them in one deployment? Are you possibly doing something programmatically which performs minor change sand creates lots of deployments?

    .
    .
    .

    Side note on version upgrades

    The Camunda version 7.13.0 is ~2.5 years old and unpatched. There were significant security relevant fixes in those releases. Even if you don't have access to the patch releases on Community Edition, an upgrade to 7.18.0 would get you a lot of patches (and also features).

    Upgrading your Camunda version will also allow you to upgrade to a newer supported Spring Boot version, which will again get you lots of security and other fixes.

    The version you are running would for instance be vulnerable to the infamous log4shell vulnerability.