I'm trying to understand how streams and materialized views work in ksqldb, and how they need to be configured.
One detail in the syntax though left me curious: Why are stream properties mandatory, while stream properties for materialized stream views are optional?
CREATE STREAM sytax (notice the "WITH ..." part is not in brackets and thus mandatory) [1]:
CREATE [OR REPLACE] [SOURCE] STREAM [IF NOT EXISTS] stream_name
( { column_name data_type [KEY | HEADERS | HEADER(key)] } [, ...] )
WITH ( property_name = expression [, ...] );
CREATE STREAM AS SELECT" syntax (notice the "WITH ..." part is in brackets and thus optional) [2]:
CREATE [OR REPLACE] STREAM stream_name
[WITH ( property_name = expression [, ...] )]
AS SELECT select_expr [, ...]
FROM from_stream
[[ LEFT | FULL | INNER ]
JOIN [join_table | join_stream]
[WITHIN [<size> <timeunit> | (<before_size> <timeunit>, <after_size> <timeunit>)]
[GRACE PERIOD <grace_size> <timeunit>]]
ON join_criteria]*
[ WHERE condition ]
[PARTITION BY column_name]
EMIT CHANGES;
I was assuming that the stream properties are required, because in both cases it will create or replace a stream. Is that assumption wrong?
[1] https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/create-stream/
[2] https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/create-stream-as-select/
The WITH clause properties are mandatory for CREATE STREAM statements because some of the properties have no default value, and therefore cannot be omitted. In particular, the "kafka_topic" property, which specifies the name of the Kafka topic to read data from (or create), does not have a default value. The "value_format" property also does not have a default value, unless the server config "ksql.persistence.default.format.value" is explicitly set [1].
This is in contrast to the WITH clause properties for CREATE STREAM AS SELECT statements which all have default values. For example, the "kafka_topic" property now defaults to the name of the stream being created (possibly with a prefix, if the server config "ksql.output.topic.name.prefix" is specified [2]), and the different formats and number of partitions and replicas default to the values for the corresponding source topic.
There's nothing stopping ksqlDB from making all WITH clause properties optional, including for CREATE STREAM statements, in order to have parity between the two types of statements. In fact, there was previous discussion about this among the developers but it wasn't followed through on [3].
[1] https://docs.ksqldb.io/en/latest/reference/server-configuration/#ksqlpersistencedefaultformatvalue
[2] https://docs.ksqldb.io/en/latest/reference/server-configuration/#ksqloutputtopicnameprefix