I am trying to get the unique values for a field from solr. I have used facet to get the field values. My facet query param looks like-
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setFacet(true);
query.addFacetField("division");
I am printing the facet value using-
resp = solrClient.query(query);
List<FacetField> fflist = resp.getFacetFields();
for(FacetField ff : fflist){
String ffname = ff.getName();
int ffcount = ff.getValueCount();
System.out.println(ffname+" "+ffcount);
List<Count> counts = ff.getValues();
for(Count c : counts){
String facetLabel = c.getName();
long facetCount = c.getCount();
System.out.println("facetlabel-->"+facetLabel+" facetcount-->"+facetCount);
}
}
I am getting following response for this-
facetlabel-->seirossecca facetcount-->184
facetlabel-->accessori facetcount-->184
facetlabel-->seirossecca facetcount-->184
facetlabel-->cinht facetcount-->116
facetlabel-->cinht facetcount-->116
facetlabel-->ethnic facetcount-->116
facetlabel-->spot facetcount-->851
facetlabel-->spot facetcount-->851
facetlabel-->top facetcount-->851
facetlabel-->raewtoof facetcount-->577
facetlabel-->footwear facetcount-->577
facetlabel-->raewtoof facetcount-->577
facetlabel-->smottob facetcount-->387602
facetlabel-->bottom facetcount-->387602
facetlabel-->smottob facetcount-->387602
facetlabel-->ytuaeb facetcount-->354158
facetlabel-->beauti facetcount-->354158
facetlabel-->ytuaeb facetcount-->354158
facetlabel-->scinortcel facetcount-->204244
facetlabel-->electron facetcount-->204244
facetlabel-->scinortcel facetcount-->204244
facetlabel-->sesserd facetcount-->161
facetlabel-->dress facetcount-->161
facetlabel-->sesserd facetcount-->161
As you can see I am getting the anagram of faceted field with separate entries but the corresponding field value is same. Division is of type-
text_search
Text search definition in schema.xml is of-
<fieldType name="text_search" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="1" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="1" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
This is because you are using ReversedWildcardFilterFactory
.
ReversedWildcardFilterFactory
: A filter that reverses tokens.
Same is happening for you..
seirossecca
is the reverse of accessories
and accessories
is shortened to accessori
because of PorterStemFilterFactory
as it removes common endings from words.
To avoid this you can remove ReversedWildcardFilterFactory
from you schema.xml
.
PorterStemFilterFactory :
is left to you if want if to remove common endings from words.