Search code examples
pythongoogle-app-enginegoogle-cloud-datastoreapp-engine-ndb

Google Datastore: ndb.put_multi not returning


I am currently reinserting some entities from XML files into Google Datastore using the NDB library. The issue I am observing is that sometimes ndb.put_multi() does not seem to return and the script hangs waiting for it.

The code is basically doing the following:

@ndb.toplevel
def insertAll(entities):
    ndb.put_multi(entities)

entities = []
for event, case in tree:
    removeNamespace(case)
    if (case.tag == "MARKGR" and event == "end"):
        # get ndb.Model entities
        tm, app, rep = decodeTrademark(case)

        entities.append(tm)
        for app_et in app:
            entities.append(app_et)
        for rep_et in rep:
            entities.append(rep_et)
        if (len(entities) > 200):
            n_entitites += len(entities)
            insertAll(entities)
            entities = []

if(len(entities) > 0):
    insertAll(entities)

I had noticed this behaviour before but it seems to be pretty nondeterministic, I was wondering if there would be a way to debug this properly and/or set a timeout on the ndb.put_multi so I can at least retry it if it does not return after a given time.

Thanks in advance,


Solution

  • Based on "App Engine datastore tip: monotonically increasing values are bad" by Ikai Lan.

    Monotonically increasing values are those that are stored/read/written/strictly sequentially, like timestamps in logs. In the current Datastore implementation they will be stored/read/written sequentially in the same location/spot and Datastore will not be able to properly split the workload. So when the OPS is high enough and Datastore is not able to grow horizontally you will notice a slowdown. This is called hotspoiting.

    On top of that that Datastore creates an index for each indexable property, except for example Text property, which means that you can have various hotspots at some point.

    Workaround

    One of the workarounds mentioned in the official documentation is to prepend a hash to indexed values:

    If you do have a key or indexed property that will be monotonically increasing then you can prepend a random hash to ensure that the keys are sharded onto multiple tablets.

    Read more on "High read/write rates to a narrow key range ".