Search code examples
pythonjsonelasticsearchelasticsearch-dsl

Index JSON files in elasticsearch using Python?


I have a bunch of JSON files(100), which are named as merged_file 1.json, merged_file 2. json and so on.

How do I index all these files into elasticsearch using python(elasticsearch_dsl) ?

I am using this code, but it doesn't seem to work:

from elasticsearch_dsl import Elasticsearch
import json
import os
import sys

es = Elasticsearch()

json_docs =[]

directory = sys.argv[1]

for filename in os.listdir(directory):
    if filename.endswith('.json'):
        with open(filename,'r') as open_file:
            json_docs.append(json.load(open_file))

es.bulk("index_name", "type_name", json_docs)

The JSON looks like this:

{"one":["some data"],"two":["some other data"],"three":["other data"]}

What can I do to make this correct ?


Solution

  • For this task you should be using elasticsearch-py (pip install elasticsearch):

    from elasticsearch import Elasticsearch, helpers
    import sys, json
    
    es = Elasticsearch()
    
    def load_json(directory):
        " Use a generator, no need to load all in memory"
        for filename in os.listdir(directory):
            if filename.endswith('.json'):
                with open(filename,'r') as open_file:
                    yield json.load(open_file)
    
    helpers.bulk(es, load_json(sys.argv[1]), index='my-index', doc_type='my-type')