Search code examples
joinapache-kafkastreamhdfsapache-flink

One source is much slower than the other side when join history data in Flink


When consuming history data in join operator with eventTime, reading data from one source is much slower than the other. As a result, the join operator will cache much data from the faster source in order to wait the slower source.

The question is that how can I make the difference of consumers' speed small?


Solution

  • I'm not sure I understand what you mean by "...make the difference of consumers' speed small". If you want to avoid caching a lot of data, and you can't control the source speed, then I think your only option is to use a smaller window, so that less data is cached. See Window Join for more details on this.