Search code examples
pythondjangoelasticsearchdjango-haystack

Rebuild_index does not update new items Haystack Elasticsearch Django


Have been looking around for this solution but can't find anything. I have just started using Haystack == 2.6.1 and am using Elasticsearch==2.4.1 and Elasticsearch server 2.4.6. I was able to perform searches and get results after following haystack's getting started. After I run

python manage.py rebuild_index

It worked the first time because like I said I was able to perform searches. But then I added 3 entries to my db and tried to rebuild and/or update the index. But now I see this:

RuntimeWarning: DateTimeField Order.dateEntered received a naive datetime (2017-11-11 16:07:54.324473) while time zone support is active.
  RuntimeWarning)
Indexing 32 orders
GET /haystack/_mapping [status:404 request:0.004s]

And so when I search I still do not see my new entries. I now have 35 orders (it indexed only 32) and I'm seeing this 404 response for GET /haystack/_mapping. Sorry I am new so some of the questions may seem silly:

  1. What is haystack expecting to GET; I have a local server running for elasticsearch but is there supposed to be a haystack server as well?
  2. Would haystack fail to index new items due to the naive datetime WARNING?
  3. And do I have to restart the elasticsearch server each time I add new items to the database?

UPDATE: In the morning I restarted the elasticsearch server and reran python manage.py rebuild_index and then it captured all 35 in the indexing process! BUT I tried adding an entry again and rerunning it and it still only indexes the 35 items - now I have 36 so it should be indexing 36 items.


Solution

  • It turns out that :

    RuntimeWarning: DateTimeField Order.dateEntered received a naive datetime (2017-11-11 16:07:54.324473) while time zone support is active.
      RuntimeWarning)
    

    was the issue. In search_index.py it updates based on the index_queryset function. From the 'Getting Started' documentation index_queryset should look like this:

    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())
    

    But since pub_date was either not the right timezone or because it was a naive format rebuild_index was not pulling the most up-to-date items. So I used the django.utils timezone date:

    from django.utils import timezone
    
    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.filter(pub_date__lte=timezone.now())