Search code examples
pythonelasticsearchpython-requestshttp-status-code-400

400 from elasticsearch when submitting POST request with "_bulk" end point


I've successfully managed to submit a bulk POST request using the elasticsearch module.

But I want to do everything NOT using this module, just using the requests module.

I'm having problems when I try to do a POST with the _bulk end point. This is the code, where post_object is a string of concatenated json.dumps(...) + '\n' of the individual documents for populating the index. NB url is simply 'http://localhost:9200/docs'.

print(f'post_object:\n{json.dumps(post_object, indent=2, default=str)}')
headers = {'Content-type': 'application/json'}        
params = {'grant_type' : 'client_credentials'}                
response = requests.post(f'{url}/_bulk', data=json.dumps(post_object, default=str) + '\n', headers=headers, params=params)        
print(f'response.json() {response.json()}')

This gives:

response.json() {'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], 
expected START_OBJECT but found [VALUE_STRING]'}], 'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], expected START_OBJECT but found [VALUE_STRING]'}, 'status': 400}

I had previously tried making post_object a list of individual JSON dicts... But this gives 400 for another reason:

response.json() {'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], 
expected START_OBJECT but found [START_ARRAY]'}], 'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], expected START_OBJECT but found [START_ARRAY]'}, 'status': 400}

... does anyone know what sort of "START_OBJECT" requests wants to find with this _bulk end point?

NB there are various pages devoted to this, e.g. here or here. I appear to have to submit a string as the data param, apparently with some metadata element, and with new lines in the right places. I also apparently have to set the header "Content-type" to "x-ndjson", which I've now done... but I'd just like a simple example, ideally!


Solution

  • Got it. NB version of ES is 7.12, Python 3.9.4.

    Just thought I'd leave this here to save someone else half an hour's wasted time.

    Yes, the use of "Content-type" "x-ndjson" turns out to be essential. It is indeed necessary to submit a string as the data parameter. Thus:

    post_object_str = ''
    for num, obj in enumerate(  ... 
        # iteration loop creating the dicts to prepare the ES documents
    
        dict_doc = {}
        # fill this with your fields, e.g.:
        dict_doc['interesting_data'] = 'something'
        dict_doc['timestamp'] = datetime.datetime.now()
        # NB you get a complaint if you try to include the "_id" field not as "metadata"
        post_object_str += f'{{"index": {{"_id": "{str(num)}"}}}}\n'
        post_object_str += json.dumps(dict_doc, default=str) + '\n'
    
    # see what it looks like    
    print(f'post_object_str:\n{post_object_str[:400]}')
            
    headers = {'Content-type': 'application/x-ndjson'}        
    response = requests.post(f'{url}/_bulk', data=post_object_str, headers=headers)
    # examine the response...
    print(f'post response.json():\n{json.dumps(response.json(), indent=2)}')
    

    200: index populated in a flash. Beautiful.