In a very simple case, I have three documents with filenames "Lark", "Larker", and "Larking" (no file extension). In solr, I index these three documents mapping the filename to a "title" field. When I do a search for "Lark" all three documents are returned (which is what I want) but they are all given the same score. I would prefer that "Lark" be scored the highest, as it is an exact match to my query, with the others coming behind.
<field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I believe the reason they are getting the same score is because of the EdgeNGramFilterFactory
employed at index time. Each document gets indexed as "La", "Lar", "Lark" with two of the documents ("Larker" and "Larking") being indexed with some additional variations. So in effect each document is an exact match for the query "Lark." I would like some way of executing a query where the term "Lark" would return all three documents but with the document titled "Lark" being returned higher than the others.
Results of query debug:
<lst name="debug">
<str name="rawquerystring">Lark</str>
<str name="querystring">Lark</str>
<str name="parsedquery">text:lark</str>
<str name="parsedquery_toString">text:lark</str>
<lst name="explain">
<str name="543d6ee4cbb33c26bbcf288b/xxnullxx/543d6ef9cbb33c26bbcf2892">
2.7104912 = (MATCH) weight(text:lark in 0) [DefaultSimilarity], result of:
2.7104912 = fieldWeight in 0, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.8332133 = idf(docFreq=3, maxDocs=68)
0.5 = fieldNorm(doc=0)
</str>
<str name="543d6ee4cbb33c26bbcf288b/xxnullxx/543d6ef9cbb33c26bbcf2893">
2.7104912 = (MATCH) weight(text:lark in 1) [DefaultSimilarity], result of:
2.7104912 = fieldWeight in 1, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.8332133 = idf(docFreq=3, maxDocs=68)
0.5 = fieldNorm(doc=1)
</str>
<str name="543d6ee4cbb33c26bbcf288b/xxnullxx/543d6ef9cbb33c26bbcf2894">
2.7104912 = (MATCH) weight(text:lark in 2) [DefaultSimilarity], result of:
2.7104912 = fieldWeight in 2, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.8332133 = idf(docFreq=3, maxDocs=68)
0.5 = fieldNorm(doc=2)
</str>
To boost the exact matches, you could create a new field, called "exact_title", with a new type "text_exact" that doesn't have the EdgeNGramFilterFactory.
In your schema you can use the line:
<copyField source="title" dest="exact_title"/>
to copy title to exact_title.
Then run your query against both fields, title and exact_title. If the query matches an exact title, the document with that exact title will receive a higher score than other documents, and will rise to the top.