I'm trying to move data between two ElasticSearch instances.Is there a way to skip the documents that are already existing in target index ?
from opensearchpy import OpenSearch,RequestsHttpConnection, helpers
def reindex_data_to_data_curation_es(es_src, es_des):
try:
helpers.reindex(es_src, src_idx, tar_idx, target_client=es_des, query={'query': {'match_all': {}}})
except Exception as e:
print("timed out", str(e))
You cannot skip them from the source index, but you can not make sure to not override them in the target index if they already exist. Simply add the op_type: create
setting in order to not override existing documents in the target index:
helpers.reindex(es_src, src_idx, tar_idx, target_client=es_des, op_type='create', query={'query': {'match_all': {}}})
^
|
add this