Search code examples
pythondjangosolrdjango-haystackdjango-1.8

haystack solr and stopwords


I am trying to use the stopwords feature with haystack and solr but it does the opposite of what it should do, instead of get no results I get all docs in the index. But that's just happen when the query is performed by haystack, in solr web interface that's work fine.

#versions
Django 1.8
django-haystack 2.4.1
solr 4.10.2

here the solr log for both solr and haytstack query for the same stopword "les" :

#solr
INFO  - 2016-02-13 10:14:26.520; org.apache.solr.core.SolrCore; [collection1] webapp=/solr path=/select params={indent=true&q=les&_=1455358468201&wt=json} hits=0 status=0 QTime=0

#haystack
INFO  - 2016-02-13 10:16:00.372; org.apache.solr.core.SolrCore; [collection1] webapp=/solr path=/select/ params={fl=*+score&sort=cname+asc,+pub_date+desc&start=0&q=(visible:(true)+AND+(les))&wt=json&fq=django_ct:(nav.pages+OR+nav.rubrique+OR+annuaire_commerces.adressecommerce+OR+agenda.event+OR+news.actualite+OR+annuaire_associations.adresseassoc)&rows=70} hits=70 status=0 QTime=3

#views
from haystack.generic_views import SearchView
class search(SearchView):
    template_name = 'search/search1.html'
    form_class = searchForm

    def get_queryset(self):
        queryset = super(search, self).get_queryset()
        q = queryset.filter(visible = True).order_by('cname','-pub_date')
        return q

Something happen in solr side when the query come from haystack, he know the word is in stopword but its like it transform this word to :

*:*

and match all docs, but can't see this in the log.

Maybe i should create a stopword array in my django project and return an empty searchQuerySet if the word is in this array?

I would really appreciate a little help on this, its not possible im alone to have this issue.

Thanks.


Solution

  • Stopwords are words that are removed from the index (and query), meaning that your query is just visible:true. They do not "stop" the query in any way.

    A possible solution might be to just remove the stopwords on index time, while retaning them when querying (having a different analysis chain for indexing and querying), resulting in the query getting no hits when the token isn't found in the index.

    But that would probably break other things, like querying for "time of change" when the only indexed value is "time change". Stopwords might not be the thing you're looking for to solve the issue you're having.