In my case scenario, Flink is sending the metrics to Datadog. Datadog Host map is as shown below { I have no Idea why is showing me latency here }
Flink metrics are sent to localhost. The issue here is that when
flink-conf.yaml
file configuration is as follows
# adding metrics
metrics.reporters: stsd , dghttp
metrics.reporter.stsd.class: org.apache.flink.metrics.statsd.StatsDReporter
metrics.reporter.stsd.host: localhost
metrics.reporter.stsd.port: 8125
# for datadog
metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.apikey: xxx
metrics.reporter.dghttp.tags: host:localhost, job_id : jobA , tm_id : task1 , operator_name : operator1
metrics.scope.operator: numRecordsIn
metrics.scope.operator : numRecordsInPerSecond
metrics.scope.operator : numRecordsOut
metrics.scope.operator : numRecordsOutPerSecond
metrics.scope.operator : latency
The issue is that Datadog is showing 163 metrics which I don't understand, which I will explain in a while
I don't understand the metrics format in datadog as it shows me metrics something like this
Now as shown in above Image
- Latency is expressed in time
- Number of events per second is event /sec
- count is some value
So my question is that which metric is this?
Also, the execution plan of my job is something like this
How do I relate the metrics in Datadog with execution plan operators in Flink?
I have read in Flink API 1.3.2 that I can use tags, I have tried to use them in flink-conf.yaml file but I don't have complete Idea what sense they make here.
My ultimate goal is to find operator latency, number of records out and in /second at each operator in this case
There are a variety of issues here.
1. You've misconfigured the scope formats. (metrics.scope.operator)
For one the configuration doesn't make sense since you specify "metrics.scope.operator" multiple times; only the last config entry is honored.
Second, and more importantly, you have misunderstood for scope formats are used for.
Scope formats configure which context information (like the ID of the task) is included in the reported metric's name.
By setting it to a constant ("latency") you've told Flink to not include anything. As a result, the numRecordsIn metrics for every operator is reported as "latency.numRecordsIn".
I suggest to just remove your scope configuration.
2. You've misconfigured the Datadog Tags
I do not understand what you were trying to do with your tags configuration.
The tags configuration option can only be used to provide global tags, i.e. tags that are attached to every single metrics, like "Flink".
By default every metric that the Datadog reports has tags attached to it for every available scope variable available.
So, if you have an operator name A, then the numRecordsIn metric will be reported with a tag "operator_name:A".
Again, I would suggest to just remove your configuration.