Search code examples
pythongoogle-app-enginedjango-cmsgae-search

What is the best approach for managing static information for a site, while implementing the Search API across it?


Recently, Google has created a new Search API that you can integrate into your google app engine application for searching documents and information within your site. Cool!

I have a site that has quite a few Django resources that contain a significant amount of static information. I would like to integrate this information into a site-wide search engine using the new Search API.

For someone with an existing site and numerous text resources used for content, what is the best way of integrating the static information (from flat, HTML files) into the sites Search API datastore? Bonus question, what is the best way to manage this content so that as I add additional pages to the site, they will be integrated into the search datastore?


Solution

  • The search API requires you to add documents to the search backend in order to be searchable. For your static resources this means you have to crawl and add them to the search backend using the search API.

    You probably want to do this after every upload. Maybe the easiest way is to have a cron job that traverses your files and checks their timestamps. If they are newer than when they were last traversed (if at all) add them to/update them in the search backend. Instead of a cron job, you could also define a handler that triggers the traversal and you hit after you deployed a new app version.