Search code examples
apache-kafkaksqldb

N-Way table joins in ksqlDB


I have three ksqldb tables, whose relation is illustrated in the picture below. I must join them.

This query will result in an error:

CREATE TABLE `reviewer-email-user`
    AS SELECT *
    FROM USER
        INNER JOIN REVIEWER ON USER.USERID = REVIEWER.USERID
        INNER JOIN EMAILADDRESS ON USER.USERID = EMAILADDRESS.USERID
EMIT CHANGES;

And the error is:

Could not determine output schema for query due to error: Invalid join condition: table-table joins require to join on the primary key of the right input table. Got USER.USERID = REVIEWER.USERID

enter image description here

So, how do I join these three ksqldb tables? Thank you.


Solution

  • table-table joins expect the join condition to have the primary key on the right side and therefore, the following will not work:

    ksql> CREATE TABLE reviewer_user
    > AS SELECT *
    > FROM REVIEWER
    >      INNER JOIN USER ON USER.user_id = REVIEWER.user_id
    >EMIT CHANGES;
    Could not determine output schema for query due to error: Cannot add table 'REVIEWER_USER': A table with the same name already exists
    Statement: CREATE TABLE REVIEWER_USER WITH (KAFKA_TOPIC='REVIEWER_USER', PARTITIONS=2, REPLICAS=1) AS SELECT *
    FROM REVIEWER REVIEWER
    INNER JOIN USER USER ON ((USER.USER_ID = REVIEWER.USER_ID))
    EMIT CHANGES;
    

    however the following query does work (note that I have flipped the left and right side of the join)

    ksql> CREATE TABLE reviewer_user
    > AS SELECT *
    > FROM REVIEWER
    >      INNER JOIN USER ON REVIEWER.user_id = USER.user_id
    >EMIT CHANGES;
    
     Message
    ---------------------------------------------
     Created query with ID CTAS_REVIEWER_USER_11
    ---------------------------------------------
    

    Another limitation of table-table joins is that it doesn't support n-way joins, so you would have to create 2 new tables(reviewer_user and email_user) by performing joins as suggested above and then finally perform a join on them to get your final result.