About 3-4 times a week one of my two 12hr tasks that acts as an ETL from an API endpoint to a Snowflake DB fails and I can't figure out exactly why.
The Cron Task Mananger says it last ran at 6:29am this morning but in retrieving the logs there's only one line which says:
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
I'm not sure if I need a warm-up, allocate specific workers, etc. because the log of the one-line error is so uninformative to me. I'm using a pretty sizable instance class I was hoping could handle most the workload.
Here is what the logs of a successful run look like:
https://github.com/markamcgown/GF/blob/main/downloaded-logs-success2.csv
And the failure:
https://github.com/markamcgown/GF/blob/main/downloaded-logs-20210104-074656.csv
App.yaml:
service: vetdata-loader
runtime: python38
instance_class: F4_1G
handlers:
- url: /task/loader
script: auto
Updated, here is my most recent app.yaml that's failing less now but still sometimes:
service: vetdata-loader
runtime: python38
instance_class: B4_1G
handlers:
- url: /task/loader
script: auto
basic_scaling:
max_instances: 11
idle_timeout: 30m
I think you don't use the correct instance class. If you have a look here about the timeouts and the task call you are limited to 10 minutes call for automatic scaling, and up to 24h with basic and manual scaling.
If I take your instance_class
, the FXXX type is suitable for automatic scaling. Use a B4_1G
instance class instead and check if you still have these issues. You should not.