google-cloud-platform google-app-engine app-engine-flexible

How to configure GCP App Engine to properly shutdown after a period of inactivity

Recently I have deployed an application in a GCP App Engine with flex environment and autoscaling settings. As far as I know, after a period of inactivity, namely, no requests, the instance associated to the App Engine shuts down until a new requests is received. However in my case the instance is always "available" and after review the GCP documentation I can't figure out what I'm doing wrong. Is it possible that the health check or readiness check is preventing the instance from shutting down?

Here is my app.yaml

runtime: python
env: flex
entrypoint: gunicorn -k uvicorn.workers.UvicornWorker -b :$PORT app:app

runtime_config:
    operating_system: "ubuntu22"
    runtime_version: "3.10"

env_variables:
  GCP_ENV: True
  CLASSIFIER_MODEL: "/mnt/ramdisk1/Model/classifier_model"
  LOGGER_CONFIG: "logging.conf"
  SENTENCE_MODEL: "/mnt/ramdisk1/Model/sentence_model"
  SPANISH_DIC: "spanish.txt"

resources:
  cpu: 1
  memory_gb: 6
  disk_size_gb: 20
  volumes:
    - name: ramdisk1
      volume_type: tmpfs
      size_gb: 2

automatic_scaling:
  cpu_utilization:
    target_utilization: 0.95
  max_num_instances: 1

And this is the configuration applied to the deployment (showed in the service section of the App Engine dashboard of GCP):

runtime: python
api_version: '1'
env: flexible
threadsafe: true
env_variables:
  CLASSIFIER_MODEL: /mnt/ramdisk1/Model/classifier_model
  GCP_ENV: 'True'
  LOGGER_CONFIG: logging.conf
  SENTENCE_MODEL: /mnt/ramdisk1/Model/sentence_model
  SPANISH_DIC: spanish.txt
automatic_scaling:
  cool_down_period: 120s
  min_num_instances: 1
  max_num_instances: 1
  cpu_utilization:
    target_utilization: 0.95
resources:
  cpu: 1
  memory_gb: 6
  disk_size_gb: 20
  volumes:
    - volume_type: tmpfs
      size_gb: 2
      name: ramdisk1
liveness_check:
  initial_delay_sec: '300'
  check_interval_sec: '30'
  timeout_sec: '4'
  failure_threshold: 4
  success_threshold: 2
readiness_check:
  check_interval_sec: '5'
  timeout_sec: '4'
  failure_threshold: 2
  success_threshold: 2
  app_start_timeout_sec: '300'
service_account: service-account
flexible_runtime_settings:
  operating_system: ubuntu22
  runtime_version: '3.10'

Somebody see something that I don't?

Thanks in advance for any clue.

Solution

The flexible environment does not offer scale to zero, see for example this comparison table.

You therefore have a minimum of 1 instance running at all times.