python search indexing information-retrieval xapian

How to implement searching for a specified user's documents?

In my current project, users can like songs, and now I'm going to add a song search so that a user can search for some song she has liked before.

I have implemented search engines using xapian before, which involves building indexes of documents periodically.

In my case, do I have to build indexes for every user's songs independently?

If I want the search results to be more real-time, does this mean that I need to build indexes incrementally every short period of time?

Solution

To take your questions separately.

Do I have to build indexes for every user's songs independently?

No; a common technique for this kind of situation is to index each like separately with both the information about the song and additionally the identifier of the user. Then when you search, you want to filter the results of the user's natural text search by the user identifier who's actually logged in.

In Xapian you'd do this by adding a term representing the user (with a suitable prefix, so you might have XU175 for a user with id 175, perhaps), and then using OP_FILTER to restrict the search to just likes by the logged-in user.

Do I need to build indexes incrementally every short period of time [to support real-time indexing]?

This is entirely dependent on the search system you're using. With Xapian you can either do that and periodically 'compact' the databases generated into one base one; or you can index live into the database -- although since Xapian is single-writer, you'd want to find a way of serialising this, such as by putting new likes onto a queue and having a single process that pops them off and indexes into the database. One largely off-the-shelf solution to this would be to use Restpose, written by one of the Xapian developers, which fills the same sort of role as Solr does for Lucene.

You can also get fancier by indexing into one database, then replicating that to another and searching the replicated version, which also gives you options to scale horizontally in future. There's a discussion of replication in the Xapian documentation.