I have a cron job which I run every three minutes, which pulls some data from a remote API, and then stores it in my local datastore. However, this is taking a huge amount of CPU time in the datastore put operation. I suspect I'm probably doing something really stupid which can be optimised a lot:
result = urllib2.urlopen(url).read()
foos = json.loads(result)['foo']
bars = json.loads(result)['bar']
models = []
for foo in foos:
d = FooContainer()
d.Property = foo.Value #in real code, this is setting a load of values based off foo
models.append(d)
for bar in bars:
d = BarContainer()
d.Property = bar.Value #in real code, this is setting a load of properties based off bar
models.append(d)
db.put(models)
As you can see, I'm storing every piece of data returned as a new "row" in my local datastore tables. Is there some technique I can use to reduce the huge datastore CPU time used by this cron job?
~2k cpu_ms looks about right. You are seeing 46k api
cpu_ms because the datastore can only write max. 10 entities per second (governed by the api), and you are writing 450+ entities, thus 450+/10
is around 46k cpu_ms.
The api
usage doesn't count directly against the bottom line of your quota, only the real
~2k will. So don't worry about it, you're just fine.