Search code examples
google-cloud-platformgoogle-cloud-firestoregoogle-cloud-datastore

What is the purpose of OPTIMISTIC_WITH_ENTITY_GROUPS concurrency mode?


Cloud Firestore in Datastore mode documents three concurrency mode options: OPTIMISTIC, OPTIMISTIC_WITH_ENTITY_GROUPS, AND PESSIMISTIC. https://cloud.google.com/datastore/docs/concepts/transactions#concurrency_modes

It seems that OPTIMISTIC is the recommended mode, but when my Legacy Datastore was automatically upgraded to Cloud Firestore in Datastore, it was automatically set to OPTIMISTIC_WITH_ENTITY_GROUPS. I am sometimes getting the following error: google.api_core.exceptions.Aborted: 409 too much contention on these datastore entities. please try again.

I'm considering to switch to OPTIMISTIC, but I don't know if it will break anything. Why would one need to use OPTIMISTIC_WITH_ENTITY_GROUPS? It seems Datastore supports ancestors and entity groups anyway in any mode (I don't think I'm using that feature anyway). Is it only necessary for App Engine 1st generation (e.g., Python 2 runtime)? I have switched to 2nd generation Python 3.

Thanks.


Solution

  • There are really only three reasons why you want to stay with OPTIMISTIC_WITH_ENTITY_GROUPS:

    1. You need writes to be atomic within the whole entity-group. Consider this example:

      • One transaction accesses an entity Foo/1/Bar/1 and a second, concurrent, transaction accesses Foo/1/Bar/2. Both these transactions touch different entities in the same entity group Foo/1.
      • Do you want both transactions to succeed? In that case you have to use OPTIMISTIC concurrency.
      • Or do you want one of the transactions to fail with "too much contention"? In that case you would need OPTIMISTIC_WITH_ENTITY_GROUPS.
      • The most likely reason why you were migrated to OPTIMISTIC_WITH_ENTITY_GROUPS is, that some of your transactions have failed in a scenario like this and the migration automation couldn't tell if you relied on this failure to occur. In other words, without knowing your code, it's impossible to say if a "contention failure" is a bug or feature to you. So the safe choice is not to change the behavior.
      • In order to answer that question, you would have to review your application logic and see what transactional guarantees you are relying on. This cannot be automated, but reasoning about this is probably not too hard for people familiar with the different modules of your application.
    2. If you are using entity-group timestamps you also need OPTIMISTIC_WITH_ENTITY_GROUPS - https://cloud.google.com/appengine/docs/legacy/standard/python/datastore/metadataqueries#entity_group_metadata

      • If this is the case, there is most likely a workaround. For example, you could transactionally write a timestamp yourself every time you make changes to an entity-group. This could potentially be sharded, so you wouldn't necessarily be creating a hot entity key.
      • This is also detected by the migration automation and could be another reason why you ended up with OPTIMISTIC_WITH_ENTITY_GROUPS.
    3. The last reason why you would need OPTIMISTIC_WITH_ENTITY_GROUPS would be that you are using Datastore through the Remote API - https://cloud.google.com/appengine/docs/legacy/standard/python/tools/remoteapi

      • The Remote API relies on OPTIMISTIC_WITH_ENTITY_GROUPS and there are currently no plans to address that.
      • If you are using the Remote API without transactions and queries, you could still switch to OPTIMISTIC but this would be very risky, and you are better advised to find an alternative to the Remote API.

    Except for these three reasons you are always better off with OPTIMISTIC. Here is some more information on how you can make the switch: https://cloud.google.com/datastore/docs/upgrade-to-firestore#transactions