python elasticsearch python-requests http-status-code-400

400 from elasticsearch when submitting POST request with "_bulk" end point

I've successfully managed to submit a bulk POST request using the elasticsearch module.

But I want to do everything NOT using this module, just using the requests module.

I'm having problems when I try to do a POST with the _bulk end point. This is the code, where post_object is a string of concatenated json.dumps(...) + '\n' of the individual documents for populating the index. NB url is simply 'http://localhost:9200/docs'.

print(f'post_object:\n{json.dumps(post_object, indent=2, default=str)}')
headers = {'Content-type': 'application/json'}        
params = {'grant_type' : 'client_credentials'}                
response = requests.post(f'{url}/_bulk', data=json.dumps(post_object, default=str) + '\n', headers=headers, params=params)        
print(f'response.json() {response.json()}')

This gives:

response.json() {'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], 
expected START_OBJECT but found [VALUE_STRING]'}], 'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], expected START_OBJECT but found [VALUE_STRING]'}, 'status': 400}

I had previously tried making post_object a list of individual JSON dicts... But this gives 400 for another reason:

response.json() {'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], 
expected START_OBJECT but found [START_ARRAY]'}], 'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], expected START_OBJECT but found [START_ARRAY]'}, 'status': 400}

... does anyone know what sort of "START_OBJECT" requests wants to find with this _bulk end point?

NB there are various pages devoted to this, e.g. here or here. I appear to have to submit a string as the data param, apparently with some metadata element, and with new lines in the right places. I also apparently have to set the header "Content-type" to "x-ndjson", which I've now done... but I'd just like a simple example, ideally!

Solution

Got it. NB version of ES is 7.12, Python 3.9.4.

Just thought I'd leave this here to save someone else half an hour's wasted time.

Yes, the use of "Content-type" "x-ndjson" turns out to be essential. It is indeed necessary to submit a string as the data parameter. Thus:

post_object_str = ''
for num, obj in enumerate(  ... 
    # iteration loop creating the dicts to prepare the ES documents

    dict_doc = {}
    # fill this with your fields, e.g.:
    dict_doc['interesting_data'] = 'something'
    dict_doc['timestamp'] = datetime.datetime.now()
    # NB you get a complaint if you try to include the "_id" field not as "metadata"
    post_object_str += f'{{"index": {{"_id": "{str(num)}"}}}}\n'
    post_object_str += json.dumps(dict_doc, default=str) + '\n'

# see what it looks like    
print(f'post_object_str:\n{post_object_str[:400]}')
        
headers = {'Content-type': 'application/x-ndjson'}        
response = requests.post(f'{url}/_bulk', data=post_object_str, headers=headers)
# examine the response...
print(f'post response.json():\n{json.dumps(response.json(), indent=2)}')

200: index populated in a flash. Beautiful.