Search code examples
apache-sparkcassandraapache-spark-sqlazure-databricksspark-cassandra-connector

Databricks Spark Cassandra connectivity throwing exception: com.datastax.driver.core.exceptions.NoHostAvailableException


I have installed the Cassandra DB in Azure Virtual Machine and want to perform read/write operation through the Azure Databricks. I am going through the Databricks offcial documentation which does not help me in configuration.
I am sharing below my code cum configurations details:

%sh
ping -c 2 vmname.westeurope.cloudapp.azure.com

Response received:

PING vmname.westeurope.cloudapp.azure.com (13.69.10.10): 56 data bytes
--- vmname.westeurope.cloudapp.azure.com ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
// define the cluster name and cassandra host name
val sparkClusterName = "adbazewdobucluster"
val cassandraHostIP = "vmname.westeurope.cloudapp.azure.com"

dbutils.fs.put(s"/databricks/init/$sparkClusterName/cassandra.sh",
  s"""
     #!/usr/bin/bash
     echo '[driver]."spark.cassandra.connection.host" = "$cassandraHostIP"' >> /home/ubuntu/databricks/common/conf/cassandra.conf
   """.trim, true)

// setting IP of the Cassandra server
spark.conf.set("spark.cassandra.connection.host", "127.0.0.1")

//verify sparkconf is set properly
spark.conf.get("spark.cassandra.connection.host")

and after applying all the configuration in spark I am trying to retrieve the records from the table resides in Cassandra DB, which is throwing the exception.

val df = sqlContext
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map( "table" -> "words_new", "keyspace" -> "test"))
  .load
df.explain

Exception:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1:9042] Cannot connect))

I have checked the my Cassandra DB is running and read/write operation working fine directly.
So my question is: Am I applying the configuration in a right way? If not so then How do I access the Cassandra from the Databricks notebook.
I am using Scala for the Spark framework and my cluster and driver versions are as following:

Databricks Runtime Version
6.2 (includes Apache Spark 2.4.4, Scala 2.11)

spark-cassandra-connector
com.datastax.spark:spark-cassandra-connector_2.11:2.4.1

cassandra version: 3.11.4

Solution

  • If you're running on Azure.. make sure to set broadcast_rpc_address to public IP address or dns hostname these settings must work for you -

    Set rpc address to ip address of your network interface attached to your VM..on Windows - Hyper V Interface.

    rpc_address: <**private ip** of your vm > 
    

    broadcast rpc address to public ip, on this ip external clients should get response from cassandra on port 9042

    broadcast_rpc_address: <**public ip** or hostname.westeurope.cloudapp.azure.com>
    

    listen address as default to localhost / 127.0.0.1

    listen_address: **localhost**