Search code examples
searchsolrfull-text-searchsolandra

Frequent Updates to Solr Documents - Efficiency/Scalability concerns


I have a Solr index with document fields something like:

id, body_text, date, num_upvotes, num_downvotes

In my application, a document is created with some integer id and some body_text (500 chars max). The date is set to the time of input, and num_upvotes and num_downvotes begin at 0.

My application gives users the ability to upvote and downvote the content mentioned above, and the reason I want to keep track of this in Solr instead of just the DB is that I want to be able to consider the number of upvotes and downvotes into my search.

This is a problem because you can't simply update a solr document (i.e. increment number of up_votes) and you must replace the entire document, which is probably fairly inefficient considering it would require hitting my DB to grab all the relevant data again.

I realize the solution may require a different layout of data, or possibly multiple indexes (although I don't know if you can query/score across solr cores).

Is anyone able to offer any recommendations on how to tackle this?


Solution

  • A solution that I use in a similar problem is to update that information in database and do SOLR Updates/Inserts every ten minutes using the documents that were modified since the last update.

    Also every night, when I don't have much traffic I do index optimize. After each import I set up some warm-up queries in SOLR config.

    In my SOLR index I have around 1.5 milion documents,each document has 24 fields, and around 2000 characters in the entire document. I update the index every 10 minutes around 500 documents ( without optimizing the index ), and I do around 50 warmup queries comprised of most common facets, most used filter queries and free text search.

    I don't get negative impact on performance. ( at least it is not visible ) - my queries run average in 0.1 seconds. ( before doing update at every 10 minutes average queries were 0.09 seconds)

    LATER EDIT:

    I didn't encounter any problems during this updates. I allways take the documents from database and insert them with a Unique key to SOLR. If the document exist in SOLR it is replaced ( this is what I mean by update).

    It never takes more than 3 minutes to update SOLR. Actually I am doing 10 minutes break after each update. So I start the update of the index, I wait for it to finish, and then I wait another 10 minutes to start again.

    I did not look on the performance over the night, but for me it is not relevant, as I want to have fresh information of data during the users visits peaks.