Search code examples
solrlucenerelevancesolr5

Understanding Solr Doc=


I have two indexed documents that I am trying to figure out the relevancy of why one is higher than the other. So I've ran DebugQuery=True in order to get the explain. Below is the relevant difference in the two documents.

Two Different Types of Documents

It may be relevant; The documents are of two different types which I use *_s field to differentiate. So my field, module_s has two modules 1 and 2. My query has:

<arr name="filter_queries">
    <str>moduleid_s:(1 OR 2)</str>
</arr>

so I don't believe this should cause an issue, but I wanted to add this info.

The relevant explain differences:

Document 1 - module type = 1

result of: 1.7325882 = score(doc=3513280,freq=1.0), 
    product of: 0.44456035 = queryWeight, 
    product of: 0.5 = boost 7.7946143 = idf(docFreq=5286,maxDocs=4721423) 0.1140686 = queryNorm 3.8973072 = fieldWeight in 3513280, 
    product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 7.7946143 = idf(docFreq=5286, maxDocs=4721423) 0.5 = fieldNorm(doc=3513280) 

Document 2 - module type = 2

result of: 0.75800735 = score(doc=174,freq=1.0), 
        product of: 0.44456035 = queryWeight, 
        product of: 0.5 = boost 7.7946143 = idf(docFreq=5286,maxDocs=4721423) 0.1140686 = queryNorm 1.7050719 = fieldWeight in 174, 
        product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 7.7946143 = idf(docFreq=5286, maxDocs=4721423) 0.21875 = fieldNorm(doc=174) 

Synopsis and Question

As you can see the explains are almost identical. They all have the same queryWeight, boost, idf, and queryNorm. What is different, is the doc=XXX. For document 1 it's 351328 and document 2 it's 174. Could someone explain what this number is? Where it comes from? And why it's Different?

resources Used


Solution

  • That number is the docid. It uniquely identifies the document for retrieval from the index. It has absolutely nothing to do with scoring.

    The real scoring difference is in the fieldnorm:

    • Document 1: 0.5 = fieldNorm
    • Document 2: 0.21875 = fieldNorm

    The fieldNorm is calculated based on two figures. The boost given to the field when the document was indexed, and the length of the field (a more precise description can be found in the section on norm(t,d) in the TFIDFSimilarity docs)

    So, either that field is shorter in Document 1, or it was given a higher boost in Document 1 when it was indexed.