Search code examples
apache-flinkflink-streaming

Why cannot I see watermark at the source but the subsequent operator in flink?


I have a flink application where I read the data from two kafka sources and perform the join operation against the two streams

I am setting the watermark strategy at the source like

     DataStreamSource<Event> dataStream =  env.fromSource(
                source, watermarkStrategy, id, typeInformation);

Now when I view the application in the flink UI, I don't see any watermark at the source but I could see it in the join operator. The value matches with what I would expect being emitted from the source. Why am I not seeing it being emitted from the source in the UI?

I added the breakpoint and debugged and I could see the watermark being generated at the source

enter image description here

enter image description here

enter image description here


Solution

  • Conceptually, watermarks are generated by the Flink runtime, not the source, and are injected into the stream after the source. So the source operator (what you see in the UI) technically doesn't know about the watermark value.

    Also note that the value of a watermark at an operator (for each sub-task) is the minimum of all incoming channels. So the Flink runtime would need to add support to capture and report the current outbound watermark for each sub-task.