Search code examples
apache-sparkcassandraspark-jobserver

Spark JobServer can use Cassandra as SharedDb


I have been doing a research about Configuring Spark JobServer Backend (SharedDb) with Cassandra.

And I saw in the SJS documentation that they cited Cassandra as one of the Shared DBs that can be used.

Here is the documentation part:

Spark Jobserver offers a variety of options for backend storage such as:

H2/PostreSQL or other SQL Databases

Cassandra

Combination of SQL DB or Zookeeper with HDFS

But I didn't find any configuration example for this.

Would anyone have an example? Or can help me to configure it?

Edited:

I want to use Cassandra to store metadata and jobs from Spark JobServer. So, I can hit any servers through a proxy behind of these servers.


Solution

  • Cassandra was supported in the previous versions of Jobserver. You just needed to have Cassandra running, add correct settings to your configuration file for Jobserver: https://github.com/spark-jobserver/spark-jobserver/blob/0.8.0/job-server/src/main/resources/application.conf#L60 and specify spark.jobserver.io.JobCassandraDAO as DAO.

    But Cassandra DAO was recently deprecated and removed from the project, because it was not really used and maintained by the community.