Search code examples
djangosolrdjango-haystack

Haystack + solr duplicate on update


I'm new to haystack/solr so this is likely a newbie error. I am using solr with haystack.

When I run update_index, it seems to be duplicating the records. I am getting:

get() returned more than one Doctor -- it returned 3!

for this piece of code:

self._object = self.searchindex.read_queryset().get(pk=self.pk) 

if I run update_index again, the number return increases by one and if I run rebuild_index, it will work showing only one record until I update again.

So from that, It seems that update_index is duplicating records in the index. How do I get it from not doing that?

Here is my haystack search index:

from haystack import indexes
from .models import Doctor, Zipcode
from django.contrib.gis.measure import D
from django.conf import settings

class DoctorIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.EdgeNgramField(document=True, use_template=True)
    name = indexes.EdgeNgramField(model_attr='name')
    specialty = indexes.MultiValueField()
    condition = indexes.MultiValueField()
    procedure = indexes.MultiValueField()
    premium = indexes.BooleanField()
    location = indexes.LocationField(model_attr='main_office__location')

    latitude = indexes.DecimalField(indexed=False)
    longitude = indexes.DecimalField(indexed=False)
    docid = indexes.IntegerField()
    slugify_name = indexes.CharField(indexed=False)
    rendered = indexes.CharField(use_template=True, indexed=False)
    premium_rendered = indexes.CharField(use_template=True, indexed=False)
    include = indexes.BooleanField(indexed=False)

    def get_model(self):
        return Doctor

    def prepare_specialty(self, obj):
        return ["%s %s"%((specialty.parent.name if specialty.parent else ""), specialty.name) for specialty in obj.specialty.all()]

    def prepare_condition(self, obj):
        return [condition.name for condition in obj.conditions.all()]

    def prepare_procedure(self, obj):
        return [procedure.name for procedure in obj.procedures.all()]

    def prepare_premium(self, obj):
        return obj.display()['premium']

    def prepare_latitude(self, obj):
        return obj.main_office.lat

    def prepare_longitude(self, obj):
        return obj.main_office.lon

    def prepare_docid(self,obj):
        return obj.id

    def prepare_slugify_name(self,obj):
        return obj.slugify_name()

    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.filter(specialty__search_include=True)

Here is my solr schema: https://gist.github.com/anonymous/5d5b011ca7fa0f3f3e29

I've done a lot of googling, but can't seem to find an answer to this.


Solution

  • So this one was tricky to track down, but the problem was actually in my index_queryset function.

    This:

    return self.get_model().objects.filter(specialty__search_include=True)
    

    should actually be this:

    return self.get_model().objects.filter(specialty__search_include=True).distinct()
    

    That function had duplicates in it and was causing my error, not the solr schema like I had thought. Specialty is a ManyToManyField.