As explained in the comprehensive article Crossing the Streams. The Outer KStream-KStream Join emits element as soon as it arrives, even before waiting for its match in another K-Stream. Downside of this is that it duplicates not-joined event along with every joined event.
Can you suggest any alternate way to implement a join of events without duplicating(as in outer join) or missing(as in inner join)?
As per the same click-view events example:
KStream<String, JsonNode> joinedEventsStream =
clickEventsStream.outerJoin(viewEventsStream,
(clickEvent, viewEvent) -> processJoin(clickEvent, viewEvent),/* Fire quickly if match found,*/
/* else fire after 2 seconds */
JoinWindows.of(Duration.ofSeconds(2L)), StreamJoined.with(Serdes.String(), jsonSerde, jsonSerde)
);
Expected results are explained below:
Atm (Kafka 2.7.0) the behavior is as describe in the blog post. This question came up multiple times already, and we create a ticket recently to change the behavior: https://issues.apache.org/jira/browse/KAFKA-10847
Atm, you could use a downstream stateful operation after the join, to buffer records until the window end (or maybe better, window close, ie, window end plus grace period) is reached. This allows you to filter out spurious left/outer join result.