Search code examples
ksqldb

Is there a guarantee that all exploded rows in a stream will get updated into table at once?


If I have a stream s1 with messages column of type Array<Map<VARCHAR, VARCHAR>> like below

ROWTIME    key   messages
-------------------------------
t1          1     [{id: 1, k1: v1, k2: v2}, {id: 2, k1: v3, k2: v4}]
t2          2     [{id: 1, k1: v5, k2: v6}, {id: 2, k1: v7, k2: v8}]
.......
.......

I am creating another stream s2 using

create stream s2 as select explode(message) from s1 emit changes;
ROWTIME           message
-----------------------------
t1              {id: 1, k1: v1, k2: v2}
t1              {id: 2, k1: v3, k2: v4}
t2              {id: 1, k1: v5, k2: v6}
t2              {id: 2, k1: v7, k2: v8}
...........
...........

My aim is to create a table with id, k1, k2 columns, I am publishing in array format in s1 to make sure that they both are updated in table together.

create stream s3 as select message['id'] as id, message['k1'] as k1, message['k2'] as k2 from s2 emit changes;
create table table1 as select id, latest_by_offset(k1), latest_by_offset(k2) from s3 group by id emit changes;

With above, is there any guarantee that all the messages (with any count, currently count is 2) which are exploded from a single array will get applied to table 1 at once? In other words is there a guarantee that below state is never possible, with only id 1 from t2 timestamp is applied on table 1 but id 2 from t2 timestamp is not applied.

ROWTIME       id        k1          k2
----------------------------------------
t1             2        v3          v4
t2             1        v5          v6

Solution

  • This isn't currently guaranteed by ksqlDB. Though it is potentially possible to enhance ksqlDB to support this. Probably worth raising a feature request.