When I execute the following topology with num.stream.threads: 1
, then it just works fine. But with num.stream.threads: 8
apparently the processing of projekte
is so fast that the two KTables won't be entirely consumed prior to the join, thus some projekt
won't have a matching mietobjekt
or wirtschaftseinheit
.
It works flawlessly with GlobalKTables, but I have to use KTables as changes in a mietobjekt
or a wirtschaftseinheit
must be propagated through.
So, how can I 'wait' or 'delay' execution until both KTables
have been consumed completely?
I found this example with custom join processor and transformer implementation, but it seems to be an overkill; https://github.com/confluentinc/kafka-streams-examples/blob/master/src/test/java/io/confluent/examples/streams/CustomStreamTableJoinIntegrationTest.java
Function { projekte: KStream<String, ProjektEvent> ->
Function { projektstatus: KStream<String, ProjektStatusEvent> ->
Function { befunde: KStream<String, ProjektBefundAggregat> ->
Function { aufgaben: KStream<String, ProjektAufgabeAggregat> ->
Function { wirtschaftseinheiten: KTable<String, WirtschaftseinheitAggregat> ->
Function { durchfuehrungen: KStream<String, ProjektDurchfuehrungAggregat> ->
Function { gruppen: KStream<String, ProjektGruppeAggregat> ->
Function { mietobjekte: KTable<String, MietobjektAggregat> ->
projekte
.leftJoin(wirtschaftseinheiten)
.leftJoin(mietobjekte)
.cogroup { _, current, previous: ProjektAggregat ->
previous.copy(
projekt = current.projekt,
wirtschaftseinheit = current.wirtschaftseinheit,
mietobjekt = current.mietobjekt,
projektErstelltAm = current.projektErstelltAm
)
}
.cogroup(projektstatus.groupByKey()) { _, projektstatusEvent, aggregat -> aggregat + projektstatusEvent }
.cogroup(befunde.groupByKey()) { _, befundAggregat, aggregat -> aggregat + befundAggregat }
.cogroup(aufgaben.groupByKey()) { _, aufgabeAggregat, aggregat -> aggregat + aufgabeAggregat }
.cogroup(durchfuehrungen.groupByKey()) { _, durchfuehrungAggregat, aggregat -> aggregat + durchfuehrungAggregat }
.cogroup(gruppen.groupByKey()) { _, gruppeAggregat, aggregat -> aggregat + gruppeAggregat }
.aggregate({ ProjektAggregat() }, Materialized.`as`(projektStoreSupplier))
.toStream()
.filterNot { _, projektAggregat -> projektAggregat.projekt == null }
.transform({ EventTypeHeaderTransformer() })
}
}
}
}
}
}
}
}
Processing order between topics is based on timestamps. You can increase max.task.idle.ms
to get better guarantees on timestamp synchronization.
Thus, if you want to "bootstrap" a KTable
, you need to ensure that the record timestamps on the "table topic" are smaller than on the "stream topic".
Also checkout these talks: