I am using Solr to index documents and now I need to search those documents for an exact phrase and sort the results by the number of times this phrase appears on the document. I also have to present the number of times the phrase is matched back to the user.
I was using the following query (here I am searching by the word SAP):
{
:params => {
:wt => "json",
:indent => "on",
:rows => 100,
:start => 0,
:q => "((content:SAP) AND (doc_type:ClientContact) AND (environment:production))",
:sort => "termfreq(content,SAP) desc",
:fl => "id,termfreq(content,SAP)"
}
}
Of course this is a representation of the actual query, that is done by transforming this hash into a query string at runtime.
I managed to get the search working by using content:"the query here"
instead of content:the query here
, but the hard part is returning and sorting by the termfreq
.
Any ideas on how I could make this work?
Obs: I am using Ruby but this is a legacy application and I can't use any RubyGems, I am using the HTTP interface to Solr here.
I was able to make it work adding a ShingleFilter to my schema.xml
:
In my case I started using SunSpot, so I just had to make the following change:
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- This is the line I added -->
<filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigrams="true"/>
</analyzer>
</fieldType>
After doing that change, restarting Solr and reindexing, I was able to use termfreq(content, "the query here")
both on my query (q=
), on the returning fields (fl=
) and even on sorting (sort=
).