Search code examples
pythonelasticsearchamazon-elasticsearch

Python Elasticseach indexing error


Elasticsearch was working well and fine before today.

Issue:

Some documents which are failing to index with error:

u'Limit of total fields [1000] in index [mintegral_incent] has been exceeded' 

Error:

"BulkIndexError: (u'14 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'mintegral_incent', u'_id': u'168108082', u'error': {u'reason': u'Limit of total fields [1000] in index [mintegral_incent] has been exceeded', u'type': u'illegal_argument_exception'}

Using Amazon Elastic service

Elasticsearch Version 5.1

ES setup:

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es_repo = Elasticsearch(hosts=[settings.ES_INDEX_URL],
                        verify_certs=True)

Code giving issue:

def bulk_index_offers(index_name, id_field, docs):
    actions = []
    for doc in docs:
        action = {
            "_index": index_name,
            "_type": index_name,
            "_id": doc.get(id_field),
            "_source": doc
        }
        actions.append(action)
    # Error at this following line.
    resp = helpers.bulk(es_repo, actions)
    return resp

What I have tried:

I have tried setting chunks to smaller and increased read_timeout to 30 from default 10 like this : resp = helpers.bulk(es_repo, actions, chunks=500, read_timeout=30)

But still facing same issue.

Please help.


Solution

  • By default, a mapping type is only allowed to contain up to 1000 fields and it seems you are exceeding that limit. In order to increase that threashold you can run this command:

    PUT mintegral_incent/_settings
    { 
      "index": {
        "mapping": {
          "total_fields": {
            "limit": "2000"
          }
        }
      }
    }
    

    Using curl, it'd look like this:

    curl -XPUT http://<your.amazon.host>/mintegral_incent/_settings -d '{ 
      "index": {
        "mapping": {
          "total_fields": {
            "limit": "2000"
          }
        }
      }
    }'
    

    Then you can run your bulk script again and it should work.