Search code examples
pythonapache-kafkaapache-kafka-streamskafka-pythonstream-processing

Does Kafka python API support stream processing?


I have used Kafka Streams in Java. I could not find similar API in python. Do Apache Kafka support stream processing in python?


Solution

  • Kafka Streams is only available as a JVM library, but there are a few comparable Python implementations of it

    In theory, you could try playing with Jython or Py4j to work with the JVM implementation, but probably would require more work than necessary.

    Outside of those options, you can also try Apache Beam, Flink or Spark, but they each require an external cluster scheduler to scale out (and also require a Java installation).

    If you are okay with HTTP methods, then running a KSQLDB instance (again, requiring Java for that server) and invoking its REST interface from Python with the built-in SQL functions can work. However, building your own functions there will requiring writing JVM compiled code, last I checked.

    If none of those options are suitable, then you're stuck with the basic consumer/producer methods.