Search code examples
apache-kafkaapache-kafka-streamsksqldb

Counting all entries with KSQL


Is it possible to use KSQL to not only count entries of a specific column via GROUP BY but instead get an aggregate over all the entries that stream through the application?

I'm searching for something like this:

| Count all | Count id1 | count id2 |
| ---245----|----150----|----95-----|

Or more like this in KSQL:

[some timestamp] | Count all | 245   
[some timestamp] | Count id1 | 150   
[some timestamp] | Count id2 | 95   
.   
.   
.   

Thank you
- Tim


Solution

  • You cannot have both counts for the all and count for each key in the same query. You can have two queries here, one for counting each value in the given column and another for counting all values in the given column. Let's assume you have a stream with two columns, col1 and col2. To count each value in col1 with infinite window size you can use the following query:

    SELECT col1, count(*) FROM mystream1 GROUP BY col1;
    

    To count all the rows you need to write two queries since KSQL always needs GROUP BY clause for aggregation. First you create a new column with constant value and then you can count the values in new column and since it is a constant, the count will represent the count of all rows. Here is an example:

    CREATE STREAM mystream2 AS SELECT 1 AS col3 FROM mystream1;
    SELECT col3, count(*) FROM mystream2 GROUP BY col3;