Search code examples
indexingcouchdbsearch-engineriakcouchbase

Couchbase or Riak as a data storage for search engine


I want to implement text search engine. Particular the document to index will be list of terms with weight.
The query is a simple list of terms.
The output of a search should be a sorted list by relevance match (against terms and their weights) The data I need to store is big! It won't fit in single node. So the final storage must be easy to distribute.

Which database storage would you recommend? After some analysis i would like to choose between Couchbase and Riak.

[edit] What do you think about simply relational databases? They already have some nice mechanism for distribution (eg: Postgres 9 has build in) [/edit]

Riak has buildin search functionality, but as far as I know i don't want to use it, because i need to have index to get response (instead to compute it for every query).

On the other hand Couchbase 2 "adds secondary indexes for JSON documents. Indexes are created via Views which can then be queried. Indexing is evenly distributed."
That sounds as a great benefit for Couchbase


Solution

  • I'd recommend Riak Search for full-text search, it's quite powerful and borrowed most of advantages from Lucene, while still being transparently fault-tolerant, replicated and scalable. If your data does not fit on a single node it's probably the most balanced opensource solution.