Search code examples
pythonmultithreadinggoogle-cloud-platformgoogle-cloud-datastore

Is there a way to execute a blocking get() call in Google Cloud Datastore?


I have an operation that requires the following:

  1. Get entity from key, using Google Cloud Datastore
  2. Do resource (CPU/memory) intensive work for ~10 seconds
  3. Update entity with results in Google Cloud Datastore

Ideally, to minimize resource use, I wouldn't want the program to even start executing #2 if there is another worker out there already in the middle of processing #2.

That would mean the get() call would block until no one else is processing #2.

My understanding from the docs and experimenting with datastore's Transaction, is that any checks for contention do not occur until the commit() call. Only then is an error thrown and rollbacks occur. But that would mean every worker executes that expensive step #2 before realizing that someone else out there is already doing the work.

Is there a way to get the get() call to block if anyone else has a Transaction using that key?

In this article, they utilize memcache. However, I'd prefer to keep it native to Cloud Datastore if possible, to minimize additional infrastructure.


Solution

  • The target of transaction is to prepare the write and then write effectively when the transaction is committed, or the change canceled when roll backed.

    The read is not blocked, only the write operation.

    You have 2 solutions for this:

    1. You have to store the entity ID somewhere, in memory store for example.
    2. You can rely on your processing duration and do this process

      • get the entity
      • write a field in your entity (anyone)
      • set a timeout on the write duration, for example to 500ms. (easy to do in Go, I don't know in other language). If the timeout is reached, a transaction is already in progress on this entity, skip the entity. If not, continue
      • create a transaction
      • perform you intensive process
      • write the result
      • commit the transaction

    However, in both case, you still have race condition in case of simultaneity