Search code examples
google-app-engineoptimizationdatastore

How best to reduce CPU time used by a datastore put


I have a cron job which I run every three minutes, which pulls some data from a remote API, and then stores it in my local datastore. However, this is taking a huge amount of CPU time in the datastore put operation. I suspect I'm probably doing something really stupid which can be optimised a lot:

result = urllib2.urlopen(url).read()
foos = json.loads(result)['foo']
bars = json.loads(result)['bar']

models = []
for foo in foos:
    d = FooContainer()
    d.Property = foo.Value #in real code, this is setting a load of values based off foo     
    models.append(d)

for bar in bars:
    d = BarContainer()
    d.Property = bar.Value #in real code, this is setting a load of properties based off bar
    models.append(d)

db.put(models)

As you can see, I'm storing every piece of data returned as a new "row" in my local datastore tables. Is there some technique I can use to reduce the huge datastore CPU time used by this cron job?


Solution

  • ~2k cpu_ms looks about right. You are seeing 46k api cpu_ms because the datastore can only write max. 10 entities per second (governed by the api), and you are writing 450+ entities, thus 450+/10 is around 46k cpu_ms.

    The api usage doesn't count directly against the bottom line of your quota, only the real ~2k will. So don't worry about it, you're just fine.