Search code examples
apache-kafkaksqldb

Kafka KSQLDB server logs constantly "found no committed offset for partition"


I run Kafka and a KSQLDB server on headless mode. On the KSQLDB Server, I have only deployed a couple of queries to experiment with:

CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews-ksql', PARTITIONS=1, REPLICAS=3, value_format='DELIMITED');

CREATE TABLE users_original (registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid VARCHAR) WITH (kafka_topic='users-ksql', PARTITIONS=1, REPLICAS=3, value_format='JSON', key = 'userid');

CREATE STREAM pageviews_enriched AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid;

My problem is that the the KSQLDB server is now constantly logging this INFO message:
"found no committed offset for partition _confluent-ksql-ksql-01query_CSAS_PAGEVIEWS_ENRICHED_0-Join-repartition-0".

It's spamming the logs with this message about 10 times per second. The corresponding topic is empty.

What does this mean and how can I fix it?


Solution

  • The log message is output when a streams thread, (a thread that does the stream processing), is assigned a topic-partition to start processing. Before it starts the processing it first checks to see if there are any committed offsets, so that it can start processing from where a previous thread finished.

    It's normal to such log lines when creating a stream or table as there haven't been any previous threads processing the partition, so there are no offsets committed.

    You may also see such log lines upon restarting your server, or during consumer group rebalancing (more on this below), if no data has been processed through the partition yet.

    Where data has previously been processed you may see similar log lines, but including details of the last processed offset.

    What is not normal is to be seeing them all the time! This suggests something is wrong.

    The most likely cause is consumer group rebalancing.

    Consumer groups handle spreading the load across all available stream processing threads, across all clustered ksqlDB servers. When a server is added or removed from the cluster the group reblances to ensure all topic partitions are being processed and work is spread evenly across all instances. There are configurable timeouts used to detect dead consumers.

    It could be that your consumer groups are unstable and this is causing constant rebalances and hence these log messages. Even then, I wouldn't expect 10s of log lines per second, unless there are many active queries or a high number of topic partitions.

    If there are consumer group rebalances going on then you should see this in the logs, though you may need to adjust the logging levels to see them.

    There's plenty of information on the net around causes and fixes for unstable consumer groups.