I've successfully managed to submit a bulk POST request using the elasticsearch
module.
But I want to do everything NOT using this module, just using the requests
module.
I'm having problems when I try to do a POST with the _bulk
end point. This is the code, where post_object
is a string of concatenated json.dumps(...) + '\n'
of the individual documents for populating the index. NB url
is simply 'http://localhost:9200/docs'.
print(f'post_object:\n{json.dumps(post_object, indent=2, default=str)}')
headers = {'Content-type': 'application/json'}
params = {'grant_type' : 'client_credentials'}
response = requests.post(f'{url}/_bulk', data=json.dumps(post_object, default=str) + '\n', headers=headers, params=params)
print(f'response.json() {response.json()}')
This gives:
response.json() {'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1],
expected START_OBJECT but found [VALUE_STRING]'}], 'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], expected START_OBJECT but found [VALUE_STRING]'}, 'status': 400}
I had previously tried making post_object
a list of individual JSON dict
s... But this gives 400 for another reason:
response.json() {'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1],
expected START_OBJECT but found [START_ARRAY]'}], 'type': 'illegal_argument_exception', 'reason': 'Malformed action/metadata line [1], expected START_OBJECT but found [START_ARRAY]'}, 'status': 400}
... does anyone know what sort of "START_OBJECT" requests
wants to find with this _bulk
end point?
NB there are various pages devoted to this, e.g. here or here. I appear to have to submit a string as the data
param, apparently with some metadata element, and with new lines in the right places. I also apparently have to set the header "Content-type" to "x-ndjson", which I've now done... but I'd just like a simple example, ideally!
Got it. NB version of ES is 7.12, Python 3.9.4.
Just thought I'd leave this here to save someone else half an hour's wasted time.
Yes, the use of "Content-type" "x-ndjson" turns out to be essential. It is indeed necessary to submit a string as the data
parameter. Thus:
post_object_str = ''
for num, obj in enumerate( ...
# iteration loop creating the dicts to prepare the ES documents
dict_doc = {}
# fill this with your fields, e.g.:
dict_doc['interesting_data'] = 'something'
dict_doc['timestamp'] = datetime.datetime.now()
# NB you get a complaint if you try to include the "_id" field not as "metadata"
post_object_str += f'{{"index": {{"_id": "{str(num)}"}}}}\n'
post_object_str += json.dumps(dict_doc, default=str) + '\n'
# see what it looks like
print(f'post_object_str:\n{post_object_str[:400]}')
headers = {'Content-type': 'application/x-ndjson'}
response = requests.post(f'{url}/_bulk', data=post_object_str, headers=headers)
# examine the response...
print(f'post response.json():\n{json.dumps(response.json(), indent=2)}')
200: index populated in a flash. Beautiful.