Search code examples
pythonmultithreadinggoogle-app-enginegoogle-cloud-pubsubapp-engine-flexible

Google Cloud Python Flexible Environment Multithreaded database worker freezes


I run a flexible service for heavy load on Google App Engine Python Flexible Environment. I run PSQ workers to handle tasks through Pub/Sub.

This is all fine and dandy as long as I work with single-threaded workers. On single threaded workers, if I instantiate a datastore client like so:

from google.cloud import datastore
_client = datastore.Client(project='project-name-kept-private')

... and retrieve an entity:

entity = _client.get(_client.key('EntityKind', 1234))

... it works fine.

However, once I do this exact same thing in a multi-threaded worker, it freezes on the last line:

entity = _client.get(_client.key('EntityKind', 1234))

I know it fails exactly on this line because I user logging.error before and after that specific line like so:

import logging
logging.error('entity test1')
entity = _client.get(_client.key('EntityKind', 1234))
logging.error('entity test2')

The line entity test1 and entity test2 both appear in the logs on a single-threaded worker, but only entity test1 gets printed on a multi-threaded worker. It never finishes the task – it just gets stuck on that line.

Any advice or pointers in the right the direction would be of great help. I've been struggling with this issues for quite some time now.


Solution

  • I figured out what the problem was, when the 'datastore_client' constructs its api client, it uses gRPC by default.
    Apparently this freezes if you use multithreaded workers.
    By setting GOOGLE_CLOUD_DISABLE_GRPC to True in environmental variables you force it to use the HTTPDatastoreAPI. This 'fixes' my problem.