Search code examples
elasticsearch-py

sync/async insert or update ElasticSearch in Python


I'm using ElasticSearch bulk Python API, Does it provide both sync and Async api?


Solution

  • If by sync you mean a blocking operation

    In Python, the bulk functions are synchronous. The easiest way to go it through the helper

    elasticsearch.helpers.bulk(client, actions, stats_only=False, **kwargs)
    

    it returns a tuple with summary informations. It is thus synchronous.

    If by sync you mean consistency

    From the bulk api:

    When making bulk calls, you can require a minimum number of active shards in the partition through the consistency parameter

    In python, the bulk function has a consistency parameter, allowing you to explicit how many shards must have acknowledged the change for the method to return.

    If by timeout you mean a way to stop the operation after a while

    If you need to limit the duration of a bulk operation, again the low level bulk() function is your friend. It takes a timeout parameter to add an explicit operation timeout.

    Even more generally,

    Global timeout can be set when constructing the client (see Connection‘s timeout parameter) or on a per-request basis using request_timeout (float value in seconds) as part of any API call

    For example:

    from elasticsearch import Elasticsearch
    es = Elasticsearch()
    # only wait for 1 second, regardless of the client's default
    es.cluster.health(wait_for_status='yellow', request_timeout=1)
    

    As a side note, I searched for the bulk() call in java and especially the bulk().await(). I couldn't find anything. May I ask you for your source ?