Search code examples
apache-flink

Flink slots metrics (available/running/registered)


What is exactly the difference between these metrics Flink exposes?

enter image description here

Thanks!


Solution

  • A slot is the unit of scheduling in Flink. To a first approximation, you can think of it as a thread plus some memory. Each task manager (worker) provides one or more slots.

    A job is an application that is running. Conceptually it is organized as a directed graph, with data flowing between the nodes (tasks).

    The job manager is the master of the cluster. It is coordinating a fleet of workers (some number of taskmanagers). The cluster has one or more applications running at any point in time (the number of running jobs). Collectively the task managers are providing some total number of task slots, some of which are currently in use, and the remainder are currently available.

    (Note that the term "job manager" has shifted in its meaning in the past year or so. In recent versions of flink there is a separate job manager for each job, and the Flink Master manages a cluster that may have many job managers -- but previously a job manager would manage the cluster and its many jobs on its own. Not all of the documentation thoroughly reflects this refactoring of the job manager monolith into a few separate components, one of which retains the name "job manager".)