I am implementing the MisraGries algorithm with Flink's DataStream API. It keeps k
counters to record the data summary by increment or decrement.
What is the best approach to store such counters when using DataStream API to implement the algorithm? Now I just declared a HashMap
variable in the operator. Is this the right approach or do I need to use some other features like state?
You should store the counters in Flink's managed state, i.e., either keyed state or operator state and enable checkpointing. Otherwise, the information will be lost in case of a failure.
If state is correctly used and checkpointing is enabled, Flink periodically checkpoints the state of an application. In case of a failure, the job is restarted and its state is reset to the latest checkpoint.