Search code examples
javasolrsolrjfacet

Facet field value return same value multiple time with anagram


I am trying to get the unique values for a field from solr. I have used facet to get the field values. My facet query param looks like-

        SolrQuery query = new SolrQuery();
        query.setQuery("*:*");
        query.setFacet(true);
        query.addFacetField("division");

I am printing the facet value using-

resp = solrClient.query(query);

            List<FacetField> fflist = resp.getFacetFields();
            for(FacetField ff : fflist){
                String ffname = ff.getName();

                int ffcount = ff.getValueCount();

                System.out.println(ffname+" "+ffcount);
                List<Count> counts = ff.getValues();
                for(Count c : counts){
                    String facetLabel = c.getName();
                    long facetCount = c.getCount();

                    System.out.println("facetlabel-->"+facetLabel+" facetcount-->"+facetCount);
                }
            }

I am getting following response for this-

facetlabel-->seirossecca facetcount-->184
facetlabel-->accessori facetcount-->184
facetlabel-->seirossecca facetcount-->184
facetlabel-->cinht facetcount-->116
facetlabel-->cinht facetcount-->116
facetlabel-->ethnic facetcount-->116
facetlabel-->spot facetcount-->851
facetlabel-->spot facetcount-->851
facetlabel-->top facetcount-->851
facetlabel-->raewtoof facetcount-->577
facetlabel-->footwear facetcount-->577
facetlabel-->raewtoof facetcount-->577
facetlabel-->smottob facetcount-->387602
facetlabel-->bottom facetcount-->387602
facetlabel-->smottob facetcount-->387602
facetlabel-->ytuaeb facetcount-->354158
facetlabel-->beauti facetcount-->354158
facetlabel-->ytuaeb facetcount-->354158
facetlabel-->scinortcel facetcount-->204244
facetlabel-->electron facetcount-->204244
facetlabel-->scinortcel facetcount-->204244
facetlabel-->sesserd facetcount-->161
facetlabel-->dress facetcount-->161
facetlabel-->sesserd facetcount-->161

As you can see I am getting the anagram of faceted field with separate entries but the corresponding field value is same. Division is of type-

text_search

Text search definition in schema.xml is of-

<fieldType name="text_search" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true" multiValued="true">
        <analyzer type="index">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.ReversedWildcardFilterFactory"/>
          <filter class="solr.PorterStemFilterFactory"/>
          <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="1" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.PorterStemFilterFactory"/>
          <filter class="solr.ReversedWildcardFilterFactory"/>
          <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="1" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
    </fieldType>

Solution

  • This is because you are using ReversedWildcardFilterFactory.

    ReversedWildcardFilterFactory : A filter that reverses tokens.

    Same is happening for you..

    seirossecca is the reverse of accessories and accessories is shortened to accessori because of PorterStemFilterFactory as it removes common endings from words.

    To avoid this you can remove ReversedWildcardFilterFactory from you schema.xml.

    PorterStemFilterFactory :
    

    is left to you if want if to remove common endings from words.