Search code examples
apache-flink

Is it possible to distribute a MapState values on several machines with Apache Flink?


I have a use case with a potential big map with values that I would like to distribute over several machines, in order to perform stream processing. Is it possible with Apache Flink to achieve that with MapState in cluster mode ? Or is it only possible to parallelize the computation on several threads of the same machine ? Does the KeyedStream provide a way to achieve this ?


Solution

  • MapState is a kind of key-partitioned state, where each node in a cluster is responsible for some disjoint subset of the key space. MapState is for use with KeyedStreams, and you effectively end up with a sharded key/value store, where the values are themselves maps.

    You might instead be looking for ValueState<T>, in which case you'll have an object of type T associated with each key.