Search code examples
apache-kafkakafka-consumer-apiprometheusjmx-exporter

How Does Prometheus Scrape a Kafka Topic?


I’m a network guy trying to build my first Kafka --> Prometheus --> Grafana pipeline. My Kafka broker has a topic which is being populated by an external producer. That’s great. But I can’t figure out how to configure my Prometheus server to scrape data from that topic as a Consumer.

I should also say that my Kafka node is running on my host Ubuntu machine (not in a Docker container). I also am running an instance of JMX Exporter when I run Kafka. Here’s how I start up Kafka on the Ubuntu command line:

KAFKA_OPTS="$KAFKA_OPTS -javaagent:/home/me/kafka_2.11-2.1.1/jmx_prometheus_javaagent-0.6.jar=7071:/home/Me/kafka_2.11-2.1.1/kafka-0-8-2.yml" \
  ./bin/kafka-server-start.sh config/server.properties &

Okay. My Prometheus (also a host process, not the Docker container version) can successfully pull a lot of metrics off of my Kafka. So I just need to figure out how to get Prometheus to read the messages within my topic. And I wonder is those messages are already visible? My topic is called “vflow.sflow,” and when I look at “scrapeable” metrics that is available on Kafka (TCP 7071), I do see these metrics:

From http://localhost:7071/metrics:

kafka_cluster_partition_replicascount{partition="0",topic="vflow.sflow",} 1.0
kafka_cluster_partition_insyncreplicascount{partition="0",topic="vflow.sflow",} 1.0
kafka_log_logendoffset{partition="0",topic="vflow.sflow",} 1.5357405E7
kafka_cluster_partition_laststableoffsetlag{partition="0",topic="vflow.sflow",} 0.0
kafka_log_numlogsegments{partition="0",topic="vflow.sflow",} 11.0
kafka_cluster_partition_underminisr{partition="0",topic="vflow.sflow",} 0.0
kafka_cluster_partition_underreplicated{partition="0",topic="vflow.sflow",} 0.0
kafka_log_size{partition="0",topic="vflow.sflow",} 1.147821017E10
kafka_log_logstartoffset{partition="0",topic="vflow.sflow",} 0.0

“Partition 0,” “Log Size,” “Log End Offset”… all those things look promising… I guess?

But please bear in mind that I’m completely new to the Kafka/JMX/Prometheus ecosystem. Question: do the above metrics describe my “vflow.sflow” topic? Can I use them to configure Prometheus to actually read the messages within the topic?

If so, can someone recommend a good tutorial for this? I’ve been playing around with my Prometheus YAML config files, but all I manage to do is crash the Prometheus process when I do so. Yes, I have been reading the large amount of online documentation and forum posts out there. Its a lot of information to digest, and its very, very easy to invest hours in documentation which proves to be a dead end.

Any advice for a newbie like me? General advice like “you’re on the right track, next look at X” or “you obviously don’t understand Y, spend more time looking at Z” will be def appreciated. Thanks!


Solution

  • When you add that argument from the Kafka container, it scrapes the MBeans of the JMX metrics, not any actual topic data, since Prometheus isn't a Kafka consumer

    From that JMX information, you'd see metrics such as message rate and replica counts

    If you'd like to read topic data, the Kafka Connect framework could be used, and there's a plugin for Influx, Mongo, and Elasticsearch, which are all good Grafana sources. I'm not sure if there's a direct Kafka to Prometheus importer, but I think it would require using the PushGateway