Search code examples
apache-flink

Is there any way to get the taskManager Id within a map in Apache Flink?


Using custom partitioning in Apache Flink, we specify a key for each record to be assigned to a particular taskmanager.

Consider we broadcast a dataset to all of the nodes, taskmanagers. Is there any While in a map or faltmap to get the taskmanagef Id or not?


Solution

  • A custom partitioner does not assign records to a TaskManager but to a specific parallel task instance of the subsequent operator (a TM can execute multiple parallel task instances of the same operator).

    You can access the ID of a parallel task instance, be extending a RichFunction, e.g., extend a RichMapFunction instead of implementing a MapFunction. Rich functions are available for all transformation. A RichFunction gives access to the RuntimeContext which tells you the ID of the parallel task instance:

    public static class MyMapper extends RichMapFunction<Long, Long> {
    
        @Override
        public void open(Configuration config) {
            int pId = getRuntimeContext().getIndexOfThisSubtask();
        }
        
        @Override
        public Long map(Long value) throws Exception {
            // ...
        }
    }