I'm trying to update documents in Elasticsearch using Kafka messages (as a StreamSource). Writing to Elasticsearch in bulks using windows and the Elasticsearch connector as a sink is fine, however, we need to update existing data in the documents and read that in a bulk-performant manner (not for every tuple but for e.g. the whole window after a byKey()
split that we want to aggregate over)
We are using Storm Trident right now which is performing bulk reads before a persistentAggregate
and writes the updated aggregations back after, minimizing interaction with the backend. I just can't find something similar in Flink - any hints?
How about running two window call on stream -
window1
- To bulk read from elasticsearch
window2
- To bulk into elasticsearch.
streamData
.window1(bulkRead and update/join)
.processFunction(...)
.window2(BulkPush)
Storm Trident
.