apache-spark pyspark cassandra databricks spark-cassandra-connector

Change spark configuration at runtime in Databricks

Is it possible to change spark configuration properties at runtime?

I'm using databricks and my goal is to read some cassandra table used in a claster used for production and after some operation write the results in another cassandra table in another cluster used for development.

Now i connect to my cassandra cluster via spark configuration properties usign:

spark.conf.set("spark.cassandra.connection.host", "cluster")
spark.conf.set("spark.cassandra.auth.username", "username")
spark.conf.set("spark.cassandra.auth.password", "password")

but if I try to change this at runtime I cannot perform the write operations.

Solution

You can also specify options on the specific read/write operations, like this:

df = spark.read \
  .format("org.apache.spark.sql.cassandra") \
  .options(**{
    "table": "words",
    "keyspace": "test" ,
    "spark.cassandra.connection.host": "host",
    ...
    })
  ).load()

See documentation for more examples.