Probably similar to the question posted here, but my goal is to share a concurrent hashmap across all parallelisms in a single flat map operator.
I have a hashmap that contains a <Integer, Integer> mapping, and I want this map to be shared among all the task slots that run my flatmap function.
The reason that I want it shared is because my flatmap is keyed via 2 values in a Tuple2, the first value being the key I care about and the second value being a secondary key as I want each event with the two key combo to end up in one worker and aggregated. Therefore, the first key may end up in multiple worker threads, so I want a way to share a gigantic hashmap across all of them like a cache of some sort.
Is this possible with Flink? Is it even a good idea? Thanks.
As soon as you add the second tuple field to the key, you're not guaranteed that each key <A, x>
will get sent to a slot that's in the same TaskManager instance. So unless you're running with a single TM, this won't work.