spring-boot google-cloud-platform spring-batch google-cloud-run spring-scheduled

Spring Scheduler not working in google cloud run with cpu throttling off

Hello All I have a spring scheduler job running which has to be run on google cloud run with a scheduled time gap.

It works perfectly fine with docker-compose local deployment. It gets triggered without any issue.

Although it works fine locally in google cloud run service with CPU throttling off which keeps CPU 100% on always it is not working after the first run.

I will paste the docker file for any once reference but am pretty sure it is working fine

FROM maven:3-jdk-11-slim AS build-env

# Set the working directory to /app
WORKDIR /app
COPY pom.xml ./
COPY src ./src
COPY css-common ./css-common

RUN echo $(ls -1 css-common/src/main/resources)

# Build and create the common jar
RUN cd css-common && mvn clean install

# Build and the job
RUN mvn package -DskipTests

# It's important to use OpenJDK 8u191 or above that has container support enabled.
# https://docs.docker.com/develop/develop-images/multistage-build/#use-multi-stage-builds
FROM openjdk:11-jre-slim

# Copy the jar to the production image from the builder stage.
COPY --from=build-env /app/target/css-notification-job-*.jar /app.jar

# Run the web service on container startup
CMD ["java","-Djava.security.egd=file:/dev/./urandom","-jar","/app.jar"]

And below is the terraform script used for the deployment

resource "google_cloud_run_service" "job-staging" {
  name     = var.cloud_run_job_name
  project  = var.project
  location = var.region

  template {
    spec {
      containers {
        image = "${var.docker_registry}/${var.project}/${var.cloud_run_job_name}:${var.docker_tag_notification_job}"
        env {
          name  = "DB_HOST"
          value = var.host
        }
        env {
          name  = "DB_PORT"
          value = 3306
        }
      }
    }

    metadata {
      annotations = {
        "autoscaling.knative.dev/maxScale"        = "4"
        "run.googleapis.com/vpc-access-egress"    = "all-traffic"
        "run.googleapis.com/cpu-throttling"       =  false
      }
    }
  }

  timeouts {
    update = "3m"
  }
}

Something I noticed in the logs itself

2022-01-04T00:19:39.178057Z2022-01-04 00:19:39.177 INFO 1 --- [ionShutdownHook] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
Standard
2022-01-04T00:19:39.182017Z2022-01-04 00:19:39.181 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
Standard
2022-01-04T00:19:39.194117Z2022-01-04 00:19:39.193 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.

It is shutting down the entity manager. I provided -Xmx1024m heap memory to make sure it has enough memory.

Although in google documentation it has mentioned it should work I am not sure for some reason the scheduler not getting triggered. Any help would be really nice.

Solution

TL;DR: Using Spring Scheduler on Cloud Run is a bad idea. Prefer Cloud Scheduler instead

In fact, you have to understand what is the lifecycle of a Cloud Run instance. First of all, CPU is allocated to the process ONLY when a request is processed.

The immediate effect of that is that background process, like a scheduler, can't work, because there isn't CPUs allocated out of request processing.

Except if you set the CPU Throttling to off. You did it? Yes great, but there are another caveats!

An instance is created when a request comes in, and live up to 15 minutes without any request processing. Then the instance is offloaded and you scale to 0.

Here again, the scheduler can't work if the instance is shut down. The solution is to set the min instance to 1 AND the CPU throttling to false to keep 1 instance 100% up and let the scheduler do its job.

Final issue with Cloud Run, is the scalability. You set 4 in your terraform, that means, you can have up to 4 instances in parallel, and therefore 4 scheduler running in parallel, one on each instance. Is it really what you want? If not, you can set the max instance to 1 to limit the number of parallel instance to 1.

At the end, you have 1 instance, full time up, and that can't scale up and down. Because it can't scale, I don't recommend you to perform processing on the current instance but to call another API which run on another Cloud Run instance and that will be able to scale up and down according to the scheduler requirement.

And so, you will have only 1 scheduler that will perform API call to another Cloud Run services to perform task. That's the purpose of Cloud Scheduler.