Search code examples
apache-sparkpysparkapache-kafkagoogle-colaboratoryspark-streaming

Is it possible to run both a kafka consumer and producer in google colab? How?


I'm fairly new to kafka and came across a repo on github (https://github.com/aber0016/Real_Time_Big_Data_Streaming_Spark_Kafka) which defined a kafka consumer in one notebook and a producer in another one. I want to know if it is possible to run both the producer and consumer in a single Google colab notebook? Because as far as I know, we need to run the consumer in a terminal and run the producer in another terminal and it doesn't seem to work on colab.

Many many thanks in advance for any help regarding this because I've been stuck for 2 weeks now.


Solution

  • Yes, it's possible. You need to use batch reading for consumption, though, since streaming jobs run indefinitely.

    Example - https://github.com/OneCricketeer/docker-stacks/blob/master/hadoop-spark/spark-notebooks/kafka-sql.ipynb

    You could also run both without using Spark