How can I limit the number of data to be uploaded to Elasticsearch? My old laptop cannot process a huge dataset like the one I'm using.
I have used the following code to 'limit' the data to be uploaded
from elasticsearch import helpers, Elasticsearch
import csv
import itertools
with open('my_data.csv', encoding="utf8") as f:
reader = csv.DictReader(f)
for row in itertools.islice(reader, 1000): #limitation of data
helpers.bulk(es, reader, index='movie-plots', doc_type=None)
But this is apparently not working; when I check with 'POST movie-plots/_count', it returns the initial size of the entire dataset.
I am completely new to Elasticsearch so sorry if this is a novice question. I am using Python client (in Jupyter notebook) in order to use Elasticsearch and Kibana.
You are calling islice
on reader
... but then you are passing all of reader
to helpers.bulk
anyway.
Not in a place where I can test; but try removing the for
loop and just passing the islice
to helpers.bulk
directly:
with open('my_data.csv', encoding="utf8") as f:
reader = csv.DictReader(f)
helpers.bulk(es, itertools.islice(reader, 1000), index='movie-plots', doc_type=None)