I'm doing real time streaming on Twitter and wonder is there a way to extract only messages and certain values from Kafka topic?
You can use ksqlDB to do this. For example:
ksql> CREATE STREAM TWEETS WITH (KAFKA_TOPIC='twitter_01', VALUE_FORMAT='Avro');
ksql> SELECT USER->SCREENNAME, TEXT FROM TWEETS WHERE TEXT LIKE '%cool%' EMIT CHANGES;
+-------------------+------------------------------------------------------------------------------------------+
|USER__SCREENNAME |TEXT |
+-------------------+------------------------------------------------------------------------------------------+
|MobileGist |This is super cool!! Great work @houchens_kim! |
You can also build a new topic with the results of this if you want
ksql> CREATE STREAM COOL_TWEETS AS SELECT USER->SCREENNAME, TEXT FROM TWEETS WHERE TEXT LIKE '%cool%' EMIT CHANGES;
Since you tagged Python it's worth pointing out that you can call ksqlDB using its REST API from Python. Here's an example.