Search code examples
sortingsolrlucenesolr-boost

Struggling with a solr query and relevance


I have a problem with boosting when using Solr. We recently switched from Lucene to Solr.

We have 4 (primary) search fields that we search against: essence, keywords, allSearchable, and quality; where, for each document in the index, essence contains the first 3 non-stop words in keywords. 'keywords' is just a list of keywords. And 'allSearchable' holds data that is just a collection of other data for a given document. What we did in lucene was to do 3 searches for any given search that a user typed into the search box (in order to rank the search results by relevance), like so:

word typed into searchbox: tree

Query 1: +essence:tree (sort by 'quality') if Query 1 returns enough for the page we're wanting to get, then return.

Query 2: +keywords:tree (sort by 'quality') if the combination of Query 1 and Query 2 returned enough results for the page we're on, then return the results.

Query 3: +allSearchable:tree (sort by 'quality') Return the results. If there aren't any, then tough luck.

My problem is with pagination. I did not used to have to send pagination (startIndex, rows) to Lucene. I could just ask for everything, and then roll over everything that I get back, collecting enough results to return, depending on the page I was asking for. With Solr, I must pass pagination parameters. We have over 8 million documents in our index, so to get everything that matches a query like 'tree' is way too expensive. The problem is that if I ask for page 3 in Query 1, and I don't get enough results, then I must go on to query 2 (keywords:tree). But this isn't right, because I am asking for page 3's results for query 2 (in other words, give me all documents that match 'keywords:tree' for page 3). But that's not really the question I want to ask. I only want to ask for page 1 of keywords if essence doesn't match anything. And so on.

What I am really looking for is ONE query, that would suffice for these three queries that I did before, such that I get back the essence matches first, the keyword matches second, and the allSearchable matches last.

I tried using boosting with this query: essence:tree^4.0 keywords:tree^2.0 allSearchable:tree^1.0

But this doesn't seem to do the trick, and I don't know why? I took out the sorts, and things still don't give me back the correct results. I am using the default StandardRequestHandler (which seems to use the LuceneQueryParser (not dismax or edismax). I can see that boosts are being sent to solr in the URL (I use boosting by adding a qf parameter to the defaults section of my requestHandler in solrconfig.xml). I certainly know that lucene can understand these parameters. Can anyone tell me how I might be able to construct one query that would allow me to get results like I want as outlined above?enter code here


Solution

  • I would recommend using the ExtendedDismax Query Parser (eDisMax) and you can then specify the boosting across the fields as shown in the example below:

    http://localhost:8983/solr/select/?q=tree
      &defType=edismax&qf=essence^4.0+keywords^2.0+allSearchable^1.0
    

    You might need to adjust the boosting values up or down across the fields to get the desired results. Plus there are additional parameters for eDisMax that effect the boosting and how the query is executed that you should examine.