Search code examples
apache-flink

Flink 'timeWindow' operation not generating output for PopularPlacesFromKafka example file


I'm going through Flink tutorial materials from dataArtisans and for some reason when I get to the sample file PopularPlacesFromKafka.scala I don't get any output sent to stdout.

...
// find popular places
val popularSpots = rides
  // match ride to grid cell and event type (start or end)
  .map(new GridCellMatcher)
  // partition by cell id and event type
  .keyBy( k => k )
  // build sliding window
  .timeWindow(Time.minutes(15), Time.minutes(5))
  // count events in window
  .apply{ (key: (Int, Boolean), window, vals, out: Collector[(Int, Long, Boolean, Int)]) =>
    out.collect( (key._1, window.getEnd, key._2, vals.size) )
  }

// print result on stdout
    popularSpots.print()
...

I've confirmed that data is being pulled from Kafka ok, and it seems to be something when it attempts to do the 'timeWindow' operation that I get no output. If I remove the 'timeWindow' operation I can see the 'keyBy' data being output. Is there something obvious that I'm missing?


Solution

  • In case anyone has this same issue this was my problem.

    My kafka topic had multiple partitions, but was producing all of the test data to a single partition (0), once I had >1 Kafka consumers, all of the consumers except for the one assigned to partition 0 aren't receiving any data, and thus not sending any watermarks down the operator chain - that causes the window functions to stop emitting data (that's also why it works fine with ProcessingTime in those situations). Here's a relevant JIRA about it:

    https://issues.apache.org/jira/browse/FLINK-5479