Search code examples
solribm-watsonretrieve-and-rank

Why isn't the document with an exact match the first result in a Retrieve and Rank Solr query?


We have taken a large quantity of documents, broken them up into segments ("answer units") using Watson's Document Conversion service, and added them to a Retrieve and Rank Solr collection. If I run a query against the collection using a copy/paste of text (maybe 150 words) from one of the answer units, Retrieve and Rank will return a bunch of documents, and (as expected) the results include the answer unit from which the query text was copied. However, that answer unit is not the very top result; it is usually 7 or 8 documents from the top. If I surround the query text with quotes, then Solr rightfully considers that a phrase and returns only that single answer unit. Without the quotes though, shouldn't the document with the exact wording in the query still be the top document in the results?


Solution

  • It seems you are using /select endpoint to search. It should not be the top result as it does not use phrase query to search. /select uses a boolean query that takes in to consideration things like idf score to come up with a final solr score. You have seen by adding quotes, you can force a phrase query if your application wants to do that. This now puts the responsibility of knowing what type of query to use on to your application.

    Now if you are using /fcselect and training the system, over time the ranker will "learn" that phrases in your question/document pair are most important, if that is in fact the case. It will then start reranking those documents higher. This is essentially the point of RnR is to learn from the queries and documents how to bring the most relevant documents to the top without your application needing to write different (often time complex) solr queries to find the documents.