Search code examples
apache-flinkapache-kafka-streams

Flink Dynamic Table vs Kafka Stream Ktable?


I was reading on the current several limitation of the joins in kafka stream such as Ktable KTable non-key join or KTable GlobalKTable ....

I discovered that Flink seems to support all of it. From what I read, A dynamic Table sound like a KTable.

I wonder if first of all, they are the same concept, and then somehow how does Flink achieve that, I could not find documentation about the underlying infrastructure. For instance i did not find the notion of broadcast join that happens with GlobalKtable. Is the underlying infrastructure achieving dynamic table distributed ??


Solution

  • Flink's dynamic table and Kafka's KTable are not the same.

    In Flink, a dynamic table is a very generic and broad concept, namely a table that evolves over time. This includes arbitrary changes (INSERT, DELETE, UPDATE). A dynamic table does not need a primary key or unique attribute, but it might have one.

    • A KStream is a special type of dynamic table, namely a dynamic table that is only receiving INSERT changes, i.e., an ever-growing, append-only table.

    • A KTable is another type of dynamic table, i.e., a dynamic table that has a unique key and changes with INSERT, DELETE, and UPDATE changes on the key.

    Flink supports the following types of joins on dynamic tables. Note that the references to Kafka's joins might not be 100% accurate (happy to fix errors!).

    • Time-windowed joins should correspond to KSQL's KStream-KStream joins
    • Temporal table joins are similar to KSQL's KStream-KTable joins. The temporal relation between both tables needs to be explicitly specified in the query to be able to run the same query with identical semantics on batch/offline data.
    • Regular joins are more generic than KSQL's KTable-KTable joins because they don't require the input tables to have unique keys. Moreover, Flink does not distinguish between primary- or foreign-key joins, but requires that joins are equi-joins, i.e., have at least one equality predicate. At this point, the streaming SQL planner does not support broadcast-forward joins (which I believe should roughly correspond to KTable-GlobalKTable joins).