apache-spark cassandra pyspark spark-cassandra-connector

How to create a spark dataframe with a Cassandra keyspace?

I have a local installation of Cassandra. I have to work in Spark with Google Colab and can run queries from my local database. But I know it is possible to connect spark and cassandra more efficiently. I would like to create a dataframe with data from a cassandra keyspace. How you do it?

My keyspace is called yelp_data. It contains the "reviews" and "business" tables.

In my project I would like a dataframe df = (data from my Cassandra keyspace). I use pyspark.

Solution

Just follow the documentation for Spark Cassandra Connector, and use spark.read with correct options, like this:

reviews_df = spark.read.format("org.apache.spark.sql.cassandra")\
  .options(table="reviews", keyspace="yelp_data").load()
business_df = spark.read.format("org.apache.spark.sql.cassandra")\
  .options(table="business", keyspace="yelp_data").load()