apache-spark pyspark apache-kafka google-colaboratory spark-streaming

Is it possible to run both a kafka consumer and producer in google colab? How?

I'm fairly new to kafka and came across a repo on github (https://github.com/aber0016/Real_Time_Big_Data_Streaming_Spark_Kafka) which defined a kafka consumer in one notebook and a producer in another one. I want to know if it is possible to run both the producer and consumer in a single Google colab notebook? Because as far as I know, we need to run the consumer in a terminal and run the producer in another terminal and it doesn't seem to work on colab.

Many many thanks in advance for any help regarding this because I've been stuck for 2 weeks now.

Solution

Yes, it's possible. You need to use batch reading for consumption, though, since streaming jobs run indefinitely.

Example - https://github.com/OneCricketeer/docker-stacks/blob/master/hadoop-spark/spark-notebooks/kafka-sql.ipynb

You could also run both without using Spark

Logging using Logback on Spark StandAlone
How to properly checkpoint a dataframe in PySpark
How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?
Is there any preference on the order of select and filter in spark?
Spark: What is the difference between repartition and repartitionByRange?
Updating values in apache parquet file
Difference between ReduceByKey and CombineByKey in Spark
Task not serializable exception while running apache spark job
Apache Spark with Spring boot - failed to start exception Factory method 'javaSparkContext' threw exception with message: javax/servlet/Servlet
which is the best way to convert json into a dataframe?
How to read a file stored in adls gen 2 using pandas?
How to define partitioning of DataFrame?
Spark Shell: spark.executor.extraJavaOptions is not allowed to set Spark options
Read CSV with "§" as delimiter using Databricks autoloader
What are the benefits of Apache Beam over Spark/Flink for batch processing?
How do I update a Spark setting in SparkR?
How to handle accented letter in Pyspark
difference between spark.kubernetes.driver.request.cores, spark.kubernetes.driver.limit.cores and spark.driver.cores
Pyspark Streaming data to Elastic search index from Kafka topic , running in Jupyter notebook, causing failure
Spark Send DataFrame as body of HTTP Post request
How to handle an AnalysisException on Spark SQL?
How convert a list into multiple columns and a dataframe?
PySpark Window functions: Aggregation differs if WindowSpec has sorting
Using rangeBetween considering months rather than days in PySpark
Pyspark replace strings in Spark dataframe column
Spark read from MongoDB and filter by objectId indexed field
How to specify file size using repartition() in spark
BloomFilter mergeInPlace() producing unexpected behavior
Spark reading from mutiple SQL databases in parallel
Spark partition size greater than the executor memory