Search code examples
apache-flink

FlinkKafkaConsumer watermark chart legend


I'm trying to understand this picture image from https://ci.apache.org/projects/flink/flink-docs-stable/dev/event_timestamps_watermarks.html#watermark-strategies-and-the-kafka-connector

  • what does white rectangle with text like N|39 in it mean?
  • what does yellow rectangle with a number in it mean (with or without black boundary)?
  • what does text like W(33) mean?
  • what does dotted black lines near W(17) mean?
  • What does the number in grey circle mean, i.e. the (1) or (2) under Source, map, or window.

Solution

  • Here's how I interpret this diagram:

    • a white rectangle with text like N|39 indicates an event for key N with timestamp 39
    • a yellow rectangle with a number in it shows the current watermark for that operator instance
    • W(33) is a Watermark on the wire with a timestamp of 33
    • the dotted black lines near W(17) show that W(17) is part of the streaming data flow
    • the numbers in grey circles, i.e. the (1) or (2) under Source, map, or window indicate parallel instances

    Also, the four orange cylinders are Kafka partitions. Source(1) and Source(2) are each connected to two Kafka partitions. The FlinkKafkaConsumer is tracking the maximum timestamp observed so far in each partition, and is emitting watermarks relative to the minimum of those per-partition maximums (after subtracting some bounded delay, in the case of a BoundedOutOfOrderness watermarking strategy).