Recently I have deployed an application in a GCP App Engine with flex environment and autoscaling settings. As far as I know, after a period of inactivity, namely, no requests, the instance associated to the App Engine shuts down until a new requests is received. However in my case the instance is always "available" and after review the GCP documentation I can't figure out what I'm doing wrong. Is it possible that the health check or readiness check is preventing the instance from shutting down?
Here is my app.yaml
runtime: python
env: flex
entrypoint: gunicorn -k uvicorn.workers.UvicornWorker -b :$PORT app:app
runtime_config:
operating_system: "ubuntu22"
runtime_version: "3.10"
env_variables:
GCP_ENV: True
CLASSIFIER_MODEL: "/mnt/ramdisk1/Model/classifier_model"
LOGGER_CONFIG: "logging.conf"
SENTENCE_MODEL: "/mnt/ramdisk1/Model/sentence_model"
SPANISH_DIC: "spanish.txt"
resources:
cpu: 1
memory_gb: 6
disk_size_gb: 20
volumes:
- name: ramdisk1
volume_type: tmpfs
size_gb: 2
automatic_scaling:
cpu_utilization:
target_utilization: 0.95
max_num_instances: 1
And this is the configuration applied to the deployment (showed in the service section of the App Engine dashboard of GCP):
runtime: python
api_version: '1'
env: flexible
threadsafe: true
env_variables:
CLASSIFIER_MODEL: /mnt/ramdisk1/Model/classifier_model
GCP_ENV: 'True'
LOGGER_CONFIG: logging.conf
SENTENCE_MODEL: /mnt/ramdisk1/Model/sentence_model
SPANISH_DIC: spanish.txt
automatic_scaling:
cool_down_period: 120s
min_num_instances: 1
max_num_instances: 1
cpu_utilization:
target_utilization: 0.95
resources:
cpu: 1
memory_gb: 6
disk_size_gb: 20
volumes:
- volume_type: tmpfs
size_gb: 2
name: ramdisk1
liveness_check:
initial_delay_sec: '300'
check_interval_sec: '30'
timeout_sec: '4'
failure_threshold: 4
success_threshold: 2
readiness_check:
check_interval_sec: '5'
timeout_sec: '4'
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: '300'
service_account: service-account
flexible_runtime_settings:
operating_system: ubuntu22
runtime_version: '3.10'
Somebody see something that I don't?
Thanks in advance for any clue.
The flexible environment does not offer scale to zero, see for example this comparison table.
You therefore have a minimum of 1 instance running at all times.