Search code examples
google-app-enginegoogle-cloud-platformcron-taskapp.yaml

Why is my task failing in Google's App Engine?


About 3-4 times a week one of my two 12hr tasks that acts as an ETL from an API endpoint to a Snowflake DB fails and I can't figure out exactly why.

The Cron Task Mananger says it last ran at 6:29am this morning but in retrieving the logs there's only one line which says:

This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.

I'm not sure if I need a warm-up, allocate specific workers, etc. because the log of the one-line error is so uninformative to me. I'm using a pretty sizable instance class I was hoping could handle most the workload.

Here is what the logs of a successful run look like:

https://github.com/markamcgown/GF/blob/main/downloaded-logs-success2.csv

And the failure:

https://github.com/markamcgown/GF/blob/main/downloaded-logs-20210104-074656.csv

App.yaml:

service: vetdata-loader
runtime: python38

instance_class: F4_1G

handlers:

- url: /task/loader
  script: auto

Updated, here is my most recent app.yaml that's failing less now but still sometimes:

service: vetdata-loader
runtime: python38

instance_class: B4_1G

handlers:

- url: /task/loader
  script: auto

basic_scaling:
  max_instances: 11
  idle_timeout: 30m

Solution

  • I think you don't use the correct instance class. If you have a look here about the timeouts and the task call you are limited to 10 minutes call for automatic scaling, and up to 24h with basic and manual scaling.

    If I take your instance_class, the FXXX type is suitable for automatic scaling. Use a B4_1G instance class instead and check if you still have these issues. You should not.