python python-2.7 rethinkdb rethinkdb-python

Using RethinkDB for "Full Text Search"

I am currently working on a web application where, ideally, I would be able to support a search bar on the documents that are going to be stored for users. Each of these documents is going to be a small snippet up to a decently-sized article. (I don't imagine any documents are going to be larger than a few KB of text for search purposes) As I have been reading about the proper ways of using RethinkDB, one of the bits of information that has stuck out as worrying to me is the performance of doing something such as a filter on non-indexed data, where I have seen people mention multiple minutes spent in one of those calls. Considering I expect that, over the long run, there are going to be at least 10,000+ documents (and in the really long run, 100,000+, 1,000,000+, etc.), is there a way to be able to search those documents in a way that has sub-second (preferably in the 10s of milliseconds) response time in the standard RethinkDB API? Or am I going to have to come up with a separate scheme that allows for quick search through clever use of indexes? Or would I be better off using another database that provides that capability?

Solution

If you don't use an index, your query is going to have to look at every document in your table, so it will get slower as your table gets larger. 10,000 documents should be reasonable to search through on fast hardware, but you probably can't do it in 10s of milliseconds, and millions of documents will probably be slow to search through.

You may want to look into elasticsearch as a way to do this: http://www.rethinkdb.com/docs/elasticsearch/