Search code examples
apache-sparkpysparkcassandradatabricksspark-cassandra-connector

Change spark configuration at runtime in Databricks


Is it possible to change spark configuration properties at runtime?

I'm using databricks and my goal is to read some cassandra table used in a claster used for production and after some operation write the results in another cassandra table in another cluster used for development.

Now i connect to my cassandra cluster via spark configuration properties usign:

spark.conf.set("spark.cassandra.connection.host", "cluster")
spark.conf.set("spark.cassandra.auth.username", "username")
spark.conf.set("spark.cassandra.auth.password", "password")

but if I try to change this at runtime I cannot perform the write operations.


Solution

  • You can also specify options on the specific read/write operations, like this:

    df = spark.read \
      .format("org.apache.spark.sql.cassandra") \
      .options(**{
        "table": "words",
        "keyspace": "test" ,
        "spark.cassandra.connection.host": "host",
        ...
        })
      ).load()
    

    See documentation for more examples.