Search code examples
solrsolrj

Sorting a Query in Solrj with whitespace


I'm using solr querying some documents. In this one case I don't want to order the results by relevance, I want them to be sorted by title. I've done the following in solrj:

//sort by title
setSortField("title", SolrQuery.ORDER.asc)

This works fine, when there are no whitespaces or slashes in the titles of the documents. When I have 4 documents, the title values are ordered like that:

"A"
"B"
"C"
"B D"

It seems to me, that Solr starts ordering from a field after the first whitespace in it. Any ideas why this happens?


Solution

  • The example I described was a constructed one, but I tested it now also with the given example and there was the same behaviour.

    I use the following config for the field:

    <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.StopFilterFactory" enablePositionIncrements="true" words="stopwords.txt" ignoreCase="true"/>
          <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" catenateAll="0" catenateNumbers="1" catenateWords="1" generateNumberParts="1" generateWordParts="1"/>
          <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
       <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.SynonymFilterFactory" ignoreCase="true" expand="true" synonyms="synonyms.txt"/>
          <filter class="solr.StopFilterFactory" enablePositionIncrements="true" words="stopwords.txt" ignoreCase="true"/>
          <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" catenateAll="0" catenateNumbers="0" catenateWords="0" generateNumberParts="1" generateWordParts="1"/>
          <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
    </fieldType>
    

    Thanks for your replies.