Search code examples
apache-flinkdashboard

Why doesn't Flink dashboard show the number of records received from the source or written to a sink?


The Flink dashboard is great and shows a lot of details for jobs that are running. One thing I have noticed, however, is that the source and sinks of a job will show the records received and records sent as 0 respectively.

Now I know that they are still receiving and sending records to and from outside of the job, but that 0 tends to be very confusing to people. Is there a reason why this was chosen to be like this? Or a way make it not be 0?

For sinks in particular, if the serialization schema fails to serialize a message (and the error is captured and logged instead of causing the job to fail) you can't see the number the sink has actually output to reflect this. You just always see 0 and would assume everything made it through.


Solution

  • The reason is that we can't measure this in a generalized fashion and have to implement the measuring in each source/sink respectively for which we just haven't found the time yet. Another issue is that this would have to be done within user-defined functions, but the relevant metrics are not accessible from there (yet).

    See https://issues.apache.org/jira/browse/FLINK-7286.