Search code examples
djangosolrdjango-haystack

Django-haystack sort results by title


I'd like to sort the results of my django-haystack query by title.

from haystack.query import SearchQuerySet
for result in SearchQuerySet().all().order_by('result_title_sort'):
    print result.result_title_sort

I keep getting this error however:

there are more terms than documents in field "result_title_sort", but it's impossible to sort on tokenized fields

This is my haystack field definition:

result_title_sort = CharField(indexed=True, model_attr='title')

How should I define the field, so I can sort on it?


Solution

  • You need to make sure that your sorting field is non-tokenized in SOLR. It's not very clear from the Haystack documentation how to make it non-tokenized using Haystack. My solution was to change the SOLR schema.xml generated by Haystack so that the field type is "string" instead of "text". So instead of having something like this in your schema.xml:

    <field name="result_title_sort" type="text" indexed="true" stored="true" multiValued="false" />
    

    you need to have this:

    <field name="result_title_sort" type="string" indexed="true" stored="true" multiValued="false" />
    

    Since you might be regenerating your schema.xml many times, I recommend creating a build script to create the schema file, which will automatically change the schema for you. Something like this:

    ./manage.py build_solr_schema | sed 's/<field name=\"result_title_sort\" type=\"text\"/<field name=\"result_title_sort\" type=\"string\"/' > schema.xml
    

    (or for Haystack 2.0)

    ./manage.py build_solr_schema | sed 's/<field name=\"name_sort\" type=\"text_en\"/<field name=\"name_sort\" type=\"string\"/' > schema.xml
    

    After I did this, my sorting worked in alphabetical order. However, there were still some problems because the sorting was ASCII order, which put lowercase and non-Roman characters at the end. So I created the following method to prepare the text for sorting, which uses the unidecode module to convert non-Roman characters to ASCII. It also removes initial spaces, "the" and "a":

    def format_text_for_sort(sort_term,remove_articles=False):
        ''' processes text for sorting field:
            * converts non-ASCII characters to ASCII equivalents
            * converts to lowercase
            * (optional) remove leading a/the
            * removes outside spaces
        '''
        sort_term = unidecode(sort_term).lower().strip()
        if remove_articles:
            sort_term =  re.sub(r'^(a\s+|the\s+)', '', sort_term )
        return sort_term
    

    Then you need to add a prepare method in your search_indexes.py to call the formatter, something like

    def prepare_result_title_sort(self,obj):
        return format_text_for_sort( obj.title, remove_articles=True )