Search code examples
pythongoogle-app-enginedatastore

Is there a way to cache the fetch output?


I'm working on a closed system running in the cloud.

What I need is a search function that uses user-typed-in regexp to filter the rows in a dataset.

phrase = re.compile(request.get("query"))
data = Entry.all().fetch(50000)  #this takes around 10s when there are 6000 records
result = x for x in data if phrase.search(x.title)

Now, the database itself won't change too much, and there will be no more than 200-300 searches a day.

Is there a way to somehow cache all the Entries (I expect that there will be no more than 50.000 of them, each no bigger than 500 bytes), so retrieving them won't take up >10 seconds? Or perhaps to parallelize it? I don't mind 10cpu seconds, but I do mind 10 second that the user has to wait.

To address any answers like "index it and use .filter()" - the query is a regexp, and I don't know about any indexing mechanism that would allow to use a regexp.


Solution

  • Since there is a bounded number of entries, you can memcache all entries and then do the filtering in memory like you've outlined. However, note that each memcache entry cannot exceed 1mb. But you can fetch up to 32mb of memcache entries in parallel.

    So split the entries into sub sets, memcache the subsets and then read them in parallel by precomputing the memcache key.

    More here:

    http://code.google.com/appengine/docs/python/memcache/functions.html