I wrote this short simple script
from elasticsearch import Elasticsearch
from fastavro import reader
es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
avro_reader = reader(fo)
for record in avro_reader:
es.index(index="my_index", body=record)
It works absolutely fine. Each record is a json and Elasticsearch can index json files. But rather than going one by one in a for loop, is there a way to do this in bulk? Because this is very slow.
There are 2 ways to do this.
requests
python from elasticsearch import Elasticsearch
from elasticsearch import helpers
from fastavro import reader
es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
avro_reader = reader(fo)
records = [
{
"_index": "my_index",
"_type": "record",
"_id": j,
"_source": record
}
for j,record in enumerate(avro_reader)
]
helpers.bulk(es, records)