Search code examples
apache-kafkaksqldb

ksql get distinct values in a column


How do I achieve in ksql

select distinct(`columnName`) from stream_name;

without using a table? I don't want to have a table compacted topic in my broker.


Solution

  • You have an endless stream. An aggregation is needed. Without a table, events will be silently ignored, e.g

    COLLECT_SET(col1) => ARRAY
    Returns an array containing the distinct values of col1 from each input row (for the specified grouping and time window, if any).

    However,

    The size of the result ARRAY can be limited to a maximum of ksql.functions.collect_set.limit entries, and any values beyond this limit are ignored silently.


    don't want to have a table compacted topic in my broker

    You already have at least one, by default - __consumer_offsets