If I have a stream s1 with messages
column of type Array<Map<VARCHAR, VARCHAR>>
like below
ROWTIME key messages
-------------------------------
t1 1 [{id: 1, k1: v1, k2: v2}, {id: 2, k1: v3, k2: v4}]
t2 2 [{id: 1, k1: v5, k2: v6}, {id: 2, k1: v7, k2: v8}]
.......
.......
I am creating another stream s2 using
create stream s2 as select explode(message) from s1 emit changes;
ROWTIME message
-----------------------------
t1 {id: 1, k1: v1, k2: v2}
t1 {id: 2, k1: v3, k2: v4}
t2 {id: 1, k1: v5, k2: v6}
t2 {id: 2, k1: v7, k2: v8}
...........
...........
My aim is to create a table with id
, k1
, k2
columns, I am publishing in array format in s1 to make sure that they both are updated in table together.
create stream s3 as select message['id'] as id, message['k1'] as k1, message['k2'] as k2 from s2 emit changes;
create table table1 as select id, latest_by_offset(k1), latest_by_offset(k2) from s3 group by id emit changes;
With above, is there any guarantee that all the messages (with any count, currently count is 2) which are exploded from a single array will get applied to table 1 at once? In other words is there a guarantee that below state is never possible, with only id 1
from t2
timestamp is applied on table 1
but id 2
from t2
timestamp is not applied.
ROWTIME id k1 k2
----------------------------------------
t1 2 v3 v4
t2 1 v5 v6
This isn't currently guaranteed by ksqlDB. Though it is potentially possible to enhance ksqlDB to support this. Probably worth raising a feature request.