I have a local installation of Cassandra. I have to work in Spark with Google Colab and can run queries from my local database. But I know it is possible to connect spark and cassandra more efficiently. I would like to create a dataframe with data from a cassandra keyspace. How you do it?
My keyspace is called yelp_data. It contains the "reviews" and "business" tables.
In my project I would like a dataframe df = (data from my Cassandra keyspace). I use pyspark.
Just follow the documentation for Spark Cassandra Connector, and use spark.read
with correct options, like this:
reviews_df = spark.read.format("org.apache.spark.sql.cassandra")\
.options(table="reviews", keyspace="yelp_data").load()
business_df = spark.read.format("org.apache.spark.sql.cassandra")\
.options(table="business", keyspace="yelp_data").load()