Search code examples
google-app-enginecursoraggregationdatastore

Appengine's Indexing order, cursors, and aggregation


I need to do some continuous aggregation on a data set. I am using app engines High Replication Datastore.

Lets say we have a simple object with a property that holds a string of the date when it's created. There's other fields associated with the object but it's not important in this example.

Lets say I create and store some objects. Below is the date associated with each object. Each object is stored in the order below. These objects will be created in separate transactions.

Obj1: 2012-11-11
Obj2: 2012-11-11
Obj3: 2012-11-12
Obj4: 2012-11-13
Obj5: 2012-11-14

The idea is to use a cursor to continually check for new indexed objects. Aggregation on the new indexed entities will be performed.

Here are the questions I have:

1) Are objects indexed in order? As in is it possible for Obj4 to be indexed before Obj 1,2, and 3? This will be a issue if i use a ORDER BY query and a cursor to continue searching. Some entities will not be found if there is a delay in indexing.

2) If no ORDER BY is specified, what order are entities returned in a query?

3) How would I go about checking for new indexed entities? As in, grab all entities, storing the cursor, then later on checking if any new entities were indexed since the last query?

Little less important, but food for thought

4) Are all fields indexed together? As in, if I have a date property, and lets say a name property, will both properties appear to be indexed at the same time for a given object?

5) If multiple entities are written in the same transaction, are all entities in the transaction indexed at the same time?

6) If all entities belong to the same entity group, are all entities indexed at the same time?

Thanks for the responses.


Solution

    1. All entities have default indexes for every property. If you use ORDER BY someProperty then you will get entities ordered by values of that property. You are correct on index building: queries use indexes and indexes are built asynchronously, meaning that it's possible that query will not find an entity immediately after it was added.

    2. ORDER BY defaults to ASC, i.e. ascending order.

    3. Add a created timestamp to you entity then order by it and repeat the cursor. See Cursors and Data Updates.

    4. Indexes are built after put() operation returns. They are also built in parallel. Meaning that when you query some indexes may be build, some not. See Life of a Datastore Write. Note that if you want to force "apply" on an entity you can issue a get() after put(), which will force the changes to be applied (= indexes written).

    5. and 6. All entities touched in the same transaction must be in the same entity group (=have common parent). Transaction isolation docs state that transactions can be unapplied, meaning that query after put() will not find new entities. Again, you can force entity to be applied via a read or ancestor query.