Search code examples
pysparkcassandradatabricksazure-databricksspark-cassandra-connector

Could not initialize class com.datastax.oss.driver.internal.core.config.typesafe.TypesafeDriverConfig


I'm using Azure Databricks solution to connect to Cassandra. My Cassandra instance is exposed at some specific port and accessible from cqlsh. Cassandra SHOW versions returns:

[cqlsh 6.0.0 | Cassandra 3.11.10 | CQL spec 3.4.4 | Native protocol v4]

I've created Cluster that runs on runtime:

7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)

I've installed following libraries: com.datastax.oss:java-driver-core:4.12.0 and com.datastax.spark:spark-cassandra-connector_2.12:3.0.1

Now I'm trying to execute simple query to load data with Dataframes:

spark.read.format("org.apache.spark.sql.cassandra")
.option("spark.cassandra.connection.host", ...)
.option("spark.cassandra.auth.username", ...)
.option("spark.cassandra.auth.password", ...)
.option("table", ...)
.option("keyspace", ...)
.load()

In response I'm getting: java.io.IOException: Failed to open native connection to Cassandra at :: Could not initialize class com.datastax.oss.driver.internal.core.config.typesafe.TypesafeDriverConfig

How can I correctly initialize connection?


Solution

  • You need to use spark-cassandra-connector-assembly (Maven Central) instead of spark-cassandra-connector. The reason - Spark Cassandra Connector uses newer version of Typesafe Config library than Databricks runtime. The assembly version includes all necessary libraries as shaded versions. And you don't need to install java-driver-core - it will be pulled as dependency automatically.

    You can find more explanations in the following blog post