Search code examples
postgresqlscalaapache-sparksbtibm-cloud

Connecting to postgresql db in Spark application running on the Bluemix Apache-Spark service


I have a problem connecting to my postgresql db in the Spark application that is launching on a cluster of Bluemix Apache-Spark service by using spark-submit.sh script

My code for scala file is

val conf = new SparkConf().setAppName("My demo").setMaster("local")
 val sc = new SparkContext(conf)
 val sqlContext = new SQLContext(sc)
 val driver = "org.postgresql.Driver"
 val url = "jdbc:postgresql://aws-us-east-1-portal.16.dblayer.com:10394/tennisdb?user=***&password=***"
 println("create")
 try {
   Class.forName(driver)
   val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" -> url, "driver" -> driver, "dbtable" -> "inputdata")).load()
   jdbcDF.show()
   println("success")
 } catch {
   case e : Throwable => {
     println(e.toString())
     println("Exception");
     }
 }
 sc.stop()

I'm using sbt file for resolving the dependencies. The code for sbt file is:

 name := "spark-sample"

 version := "1.0"

 scalaVersion := "2.10.4"

 // Adding spark modules dependencies

 val sparkModules = List("spark-core",
   "spark-streaming",
   "spark-sql",
   "spark-hive",
   "spark-mllib",
   "spark-repl",
   "spark-graphx"
 )

 val sparkDeps = sparkModules.map( module => "org.apache.spark" % s"${module}_2.10" % "1.4.0" )     

 libraryDependencies ++= sparkDeps

 libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1201-jdbc41"

Then I use sbt package command for creating a jar for my application to run it on a cluster using Bluemix Apache-Spark service. The jar is created successfully for me and the application runs locally without any errors. But when I submit the application to Bluemix Apache-Spark service using spark-submit.sh script I get ClassNotFoundException for org.postgresql.Driver


Solution

  • One of the other way easy way to do this:- Just put all the library files under the directory where your application jar is and tell spark-submit.sh to look for it.

    charles@localhost tweetoneanalyzer]$ spark-submit --jars $(echo application/*.jar | tr ' ' ',') --class "SparkTweets" --master local[3] application/spark-sample.jar

    In above example, spark-submit will upload all the jars indicated by --jars flag under application folder to server so you should put any library jars that you would use , in your case(postgresql-9.1-901-1.jdbc4.jar) and specify your application jar to be ran in the later argument application/spark-sample.jar

    Thanks,

    Charles.