I have three ksqldb tables, whose relation is illustrated in the picture below. I must join them.
This query will result in an error:
CREATE TABLE `reviewer-email-user`
AS SELECT *
FROM USER
INNER JOIN REVIEWER ON USER.USERID = REVIEWER.USERID
INNER JOIN EMAILADDRESS ON USER.USERID = EMAILADDRESS.USERID
EMIT CHANGES;
And the error is:
Could not determine output schema for query due to error: Invalid join condition: table-table joins require to join on the primary key of the right input table. Got USER.USERID = REVIEWER.USERID
So, how do I join these three ksqldb tables? Thank you.
table-table joins expect the join condition to have the primary key on the right side and therefore, the following will not work:
ksql> CREATE TABLE reviewer_user
> AS SELECT *
> FROM REVIEWER
> INNER JOIN USER ON USER.user_id = REVIEWER.user_id
>EMIT CHANGES;
Could not determine output schema for query due to error: Cannot add table 'REVIEWER_USER': A table with the same name already exists
Statement: CREATE TABLE REVIEWER_USER WITH (KAFKA_TOPIC='REVIEWER_USER', PARTITIONS=2, REPLICAS=1) AS SELECT *
FROM REVIEWER REVIEWER
INNER JOIN USER USER ON ((USER.USER_ID = REVIEWER.USER_ID))
EMIT CHANGES;
however the following query does work (note that I have flipped the left and right side of the join)
ksql> CREATE TABLE reviewer_user
> AS SELECT *
> FROM REVIEWER
> INNER JOIN USER ON REVIEWER.user_id = USER.user_id
>EMIT CHANGES;
Message
---------------------------------------------
Created query with ID CTAS_REVIEWER_USER_11
---------------------------------------------
Another limitation of table-table joins is that it doesn't support n-way joins, so you would have to create 2 new tables(reviewer_user and email_user) by performing joins as suggested above and then finally perform a join on them to get your final result.