I run this query
select * from USER_EVENTS emit changes limit 1;
USER_EVENTS is a stream.
Before this i set auto.offset.reset
to earliest
.
This query run slowly. I don't know why.
And then i show queries to check consumer id of above query and search it in kafka connect.
And i find out query need fetch all message in topic, although i only need one row.
Is that true, and why it need fetch all ? I think fetch one is enough because i had add limit 1
to query.
Topic behind USER_EVENTS
have ~1 m message.
I use ksqlServer 6.1.0 and the same for ksqlCli.
This is what ksqldb is supposed to do. Consume the entire stream and materialize a table from that. Your query even says
emit changes
which means it will go through your messages one by one and update the table in near real time. LIMIT 1
only means, that it will show a single message (and update that) instead of showing a growing table, but it consumes the stream either way.
The alternative would be
emit final
which would only show the final result, but still go trough the entire stream.
At least to my knowledge, this is not possible with ksqldb.
If you just need to look at one message interactively, I recommend to use a CLI tool like kcat or https://github.com/birdayz/kaf which all have a config option to consume only a single message.
If you need it programmatically, I would probably try to write a consumer by hand and simple call poll()
once instead of the standard poll
loop.
If you want "hacky" quickfix, you could also try to set
SET 'auto.offset.reset'='earliest';
for your query in ksqldb. This will still go through the entire stream, but start with the newest available message. So it would ignore everything that is in the topic.