Search code examples
apache-flinkflink-streaming

How many instances of Flink Functions is created?


Assuming the following pipeline:

input.filter(new RichFilterFunction<MyPojo>() {
        @Override
        public boolean filter(MyPojo value) throws Exception {
            return false;
        }
     });

How many instances of the above rich function will be created?

  • Per task with no exceptions
  • Per task, however all parallel tasks on a particular node share one instance, since they are part of one JVM instance

Solution

  • There will always be as many instances as the parallelism indicates. There are two reasons related to state for that:

    1. If your function maintains a state, especially in a keyed context, a shared instance would cause unintended side effects.
    2. In the early days, users liked to maintain their own state (e.g., remembering previous value). Even though, it's heavily discouraged, it would still be bad if Flink could not support that.