ElasticSearch Getting Exception : inference process queue is full. Unable to execute command while doing bulk insert using Python

I am new to ElastiSearch. I am trying to do bulk insert using Python into the elasticsearch index which is using nlp model through ingest pipeline to convert text into embeddings. But not all the documents are getting inserted only 2000 documents are inserting out of 40k documents.

Elastics Search Version 8.3

Below exception I am getting while calling bulk insert command

{'index': {'_index': 'index_name', '_id': '40962', 'status': 500, 'error': {'type': 'exception', 'reason': 'org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: inference process queue is full. Unable to execute command', 'caused_by': {'type': 'es_rejected_execution_exception', 'reason': 'inference process queue is full. Unable to execute command'}}}},

Please

Solution

This is due to inference items being queued up and being rejected. This can happen when there are MANY items being ingested through a model that takes a while to infer.

The solution here is to:

Increase the inference deployment queue size to match your bulk ingest size (queue_capacity query parameter in the start deployment API)
Reduce your bulk request size to the default queue size (1024) and wait for the bulk requests to finish before sending another one.

Some relevant documentation: https://www.elastic.co/guide/en/elasticsearch/reference/8.3/start-trained-model-deployment.html

example of starting a deployment with a specific capacity

POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&queue_capacity=2000