Search code examples
gogoogle-cloud-datastoredatastore

Difference between namespace and ancestor in data structure


What could be the diference between

key := datastore.NameKey("user", userID, nil)
client.Put(ctx,datastore.IncompleteKey("session",key),&sessionUser)

and

key :=&datastore.Key{Kind:"session",Parent:nil,Namespace:userID}
client.Put(ctx,key,&sessionUser)

Why would they be different if they both have the same write/read that can cause contention From this article

Cloud Datastore prepends the namespace and the kind of the root entity group to the Bigtable row key. You can hit a hotspot if you start to write to a new namespace or kind without gradually ramping up traffic.

I'm really confuse how should I strut my data because of that, by the way, which of them is faster when reading?


Solution

  • The difference is that the namespace contention corner case you mentioned is just a transient one, equivalent (from the root cause perspective), if you want, with this one:

    ...

    If you create new entities at a very high rate for a kind which previously had very few existing entities. Bigtable will start off with all entities on the same tablet server and will take some time to split the range of keys onto separate tablet servers.

    ...

    The transient lasts only until sufficient tablet splits occur to keep up with the write ops rate. For the case you quoted a gradual traffic ramp-up will give time for these splits to happen before hitting errors, avoiding the issue. Even without a gradual ramp-up - contention may occur only until the splits happen, after which it disappears.

    Using an ancestry, on the other hand, raises a permanent problem, of a different kind. All entities sharing the same ancestry are placed in the same entity group and thus all share the maximum 1 write per second per entity group rate. The larger the group the higher risk of contention. Using non-ancestor related entities (with or without namespaces) effectively creates entity groups with a size of one - minimal contention of this type.

    So unless you really, really need the ancestry, I'd suggest to try to avoid it if your expected usage patterns leave room for contention.

    Side note: that article only touches on the write contention, but you should be aware that contention can occur at read as well (in transactions), see Contention problems in Google App Engine. The entity group size matters in this case as well as a transaction attempts to lock the entire entity group.