I'm developing an app in python for google app engine that count different kind of clicks with sharded counters.
My problem is that I want to get statistics split by hour from the clicks and not just a total sum of all clicks.
One way to achieve that is to add a timestamp to the index of the shards:
def txn():
index = random.randint(0, config.num_shards - 1)
shard_name = code + str(index) # + timestamp without seconds
counter = ClickCounter.get_by_key_name(shard_name)
if counter is None:
counter = ClickCounter(key_name=shard_name, code=code)
counter.click += 1
counter.put()
db.run_in_transaction(txn)
The problem would be that counting all the shards for a month would be more than 700 times slower.
Is there a good way to cache the results? I mean, once an hour has passed the counter won't change any more. Is there a drawback with saving every click in a new object?
Your solution will work - just use the task queue to aggregate your sharded records into nice, reporting-friendly, summary records as you go.