Search code examples
elasticsearchpluginslogstashfluentd

Fluentd - Config Setting for Elastic Search Bulk Batch Size


Is there a configuration option to control the number of records pushed into Elastic Search per batch ? In Logstash, following related configuration options are available --pipeline.batch.size --pipeline.batch.delay


Solution

  • In Logstash, pipeline.batch.size doesn't specify the number of records pushed to Elasticsearch, but the number of records that make it into each input pipeline (by default one per available CPU, but can be specified in pipeline.workers).

    So, say, you are running on a 8 core, you'll have by default 8 input pipeline processing 125 records each, which amounts to 1000 records. Those 1000 records will flow in parallel through the filters and outputs plugin.

    For the output part, say the elasticsearch plugin, you don't get to choose how many records get output per batch. By default, the plugin will attempt to send records in 20MB batches.

    Concerning Fluentd, you need to leverage the buffering configuration and some specific buffer options.

    Here is also a good article on how to tune this for Fluentd.