Search code examples
pythondjangoelasticsearchdjango-haystack

Django-Haystack & Elasticsearch brake with queries containing special characters


So I've been trying to fix a bug that really annoys me: Django-Haystack & Elasticsearch queries are working with accents but it brakes everytime with queries containing special characters like dash - and apostrophes '.

For example let's use Baie-d'Urfé as the query.

Here's my code:

forms.py

class FacetedProductSearchForm(FacetedSearchForm):

def __init__(self, *args, **kwargs):
    data = dict(kwargs.get("data", []))
    self.ptag = data.get('ptags', [])
    self.q_from_data = data.get('q', '')
    super(FacetedProductSearchForm, self).__init__(*args, **kwargs)

def search(self):
    sqs = super(FacetedProductSearchForm, self).search()

    # Ideally we would tell django-haystack to only apply q to destination
    # ...but we're not sure how to do that, so we'll just re-apply it ourselves here.
    q = self.q_from_data
    sqs = sqs.filter(destination=Exact(q))

    print('should be applying q: {}'.format(q))
    print(sqs)

    if self.ptag:
        print('filtering with tags')
        print(self.ptag)
        sqs = sqs.filter(ptags__in=[Exact(tag) for tag in self.ptag])

    return sqs

Using FacetedSearch in View.py

class FacetedSearchView(BaseFacetedSearchView):

form_class = FacetedProductSearchForm
facet_fields = ['ptags']
template_name = 'search_result.html'
paginate_by = 30
context_object_name = 'object_list'

And my search_indexes.py

class ProductIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.EdgeNgramField(
    document=True, use_template=True,
    template_name='search/indexes/product_text.txt')
destination = indexes.CharField(model_attr="destination") #boost=1.125

# Tags
ptags = indexes.MultiValueField(model_attr='_ptags', faceted=True)

# for auto complete
content_auto = indexes.EdgeNgramField(model_attr='destination')

# Spelling suggestions
suggestions = indexes.FacetCharField()

def get_model(self):
    return Product

def index_queryset(self, using=None):
    """Used when the entire index for model is updated."""
    return self.get_model().objects.filter(timestamp__lte=timezone.now())

Any ideas on how to fix this?

Thanks a lot!


Solution

  • The problem seems to be related to Elasticsearch itself, so what I did is remove all my Elasticsearch instances and reformulated my search view to simple postgresql queries.

    Final observation after solving this:

    50$ / month saved and a search engine working like a charm!