Search code examples
javaapache-kafkaetlapache-kafka-streamsstream-processing

Using Kafka Streams for Custom transformation


I have been implementing ETL data pipeline using Apache Kafka. I have used Kafka Connect for Extraction and Load.

Connect will read the source data and write Kafka topic actual data available in the form of JSON.

On Transform phase I want to read the JSON data from a Kafka topic then need I need to convert into the SQL queries based on some custom business logic then need to write to output Kafka topic.

As of now, i have written a producer-consumer application which read from topic do the transformation and then writes to output topic.

Is it possible to achieve the same using Kafka stream API? If Yes Please provide Some samples.


Solution

  • Check out Kafka Streams, or KSQL. KSQL runs on top of Kafka Streams, and gives you a very simple way to build the kind of aggregations that you're talking about.

    Here's an example of doing aggregations of streams of data in KSQL

    SELECT PAGE_ID,COUNT(*) FROM PAGE_CLICKS WINDOW TUMBLING (SIZE 1 HOUR) GROUP BY PAGE_ID
    

    See more at : https://www.confluent.io/blog/using-ksql-to-analyse-query-and-transform-data-in-kafka

    You can take the output of KSQL which is actually just a Kafka topic, and stream that through Kafka Connect e.g. to Elasticsearch, Cassandra, and so on.