Search code examples
javagoogle-app-enginegoogle-cloud-datastore

google datastore aggregate query


I have been reading a lot on ways to do aggregate queries on the datastore (thru stackoverflow and elsewhere). The preponderance of answers is that it cannot be done in a pleasant way. But then those answers are dated, and the same people tend to also claim that you cannot do things such as order by on the datastore.

As it exists today, you actually can specify ORDER BY on the datastore. So I am wondering if aggregation is also possible.

Consider the scenario where I have five candidates Alpha, Brave, Charie, Delta and Echo; and 10,000 voters. I want to retrieve the candidates and the number of votes each received in order. How would I do that on the datastore? I am using java.

Also, as an aside, if the answer is still no and fanning-in is my best option: is fan-in thread safe? By fanning-in I mean keeping an explicit counter that counts the vote each candidate receives (in a separate table). Could I experience a race condition or some other faults in the data when multiple users are voting concurrently?


Solution

  • If by aggregating you mean having the datastore compute the total # of votes for you, then no, the datastore won't do that.

    The best way to do what you're describing is:

    • Create a set of sharded counters per candidate (google search for app engine sharded counters).
    • When someone votes, update the sharded counter for the given delegate.
    • When you want to read the votes, query for your delegates, then for each delegate, query for the sharded counters and sum them up.
    • Memcache for better performance, the GAE sharding counters example available in the docs shows this pretty well.