Search code examples
apache-kafkaapache-kafka-connectksqldb

KSQL joining Kafka stream to KTable


I am trying to perform a join between a table and stream but for some reason, it is not working.

ksql> SELECT GROUP_TABLE.SOCIETY_ID, GROUP_TABLE.ID FROM GROUP_TABLE;
1 | 2
^CQuery terminated
ksql> SELECT DEVICE_STREAM.GROUP_ID FROM DEVICE_STREAM
2
^CQuery terminated
ksql> SELECT GROUP_TABLE.SOCIETY_ID  FROM DEVICE_STREAM LEFT JOIN GROUP_TABLE ON DEVICE_STREAM.GROUP_ID = GROUP_TABLE.ID;
null

I am expecting 1 instead of a null value. GROUP_TABLE is created from a topic synced to Kafka through Postgres JDBC connector.

The above behavior is for the existing data. If I happen to enter new data inside Postgres, the join statement returns the correct value


Solution

  • It's mostly likely the timestamps on the data. A stream-table join is more of a lookup/enrichment that an traditional SQL join. ksqlDB tries its best to process data in time order. If the timestamp on the DEVICE_STREAM is before the GROUP_TABLE row, then ksqlDB is not joining the data as the row does not exist in the table yet.