I am running pyspark on google colab. I have set up Kafka and added a csv file in a topic. If I don't use structured streaming to read from kafka, I am able to read the data and print it.
However, when I try to read the same data using spark structured streaming, the loop just keeps on running without anything getting printed on the terminal.
How, do I print the data in this case ? Any help will be much appreciated. Thanks !
Printing to a console doesn't work nicely in environments like Colab or Databricks. What you can do instead is to use memory
sink:
query = streaming_df.writeStream.format("memory").queryName("streaming_df").start()
Then, you can query your in-memory output using:
spark.sql("SELECT * FROM streaming_df").show()