Search code examples
apache-sparkibm-clouddata-science-experiencespark-cloudant

Spark-cloudant package 1.6.4 loaded by %AddJar does not get used by notebook


I'm trying to use the latest spark-cloudant package with a notebook:

%AddJar -f https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.4/cloudant-spark-v1.6.4-167.jar

Which outputs:

Starting download from https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.4/cloudant-spark-v1.6.4-167.jar
Finished download of cloudant-spark-v1.6.4-167.jar

Followed by:

val dfReader = sqlContext.read.format("com.cloudant.spark")
dfReader.option("cloudant.host", sourceDB.host)
if (sourceDB.username.isDefined && sourceDB.username.get.nonEmpty) dfReader.option("cloudant.username", sourceDB.username.get)
if (sourceDB.password.isDefined && sourceDB.password.get.nonEmpty) dfReader.option("cloudant.password", sourceDB.password.get)
val df = dfReader.load(sourceDB.database).cache()

Which outputs:

Use connectorVersion=1.6.3, dbName=ratingdb, indexName=null, viewName=null,jsonstore.rdd.partitions=5, + jsonstore.rdd.maxInPartition=-1,jsonstore.rdd.minInPartition=10, jsonstore.rdd.requestTimeout=900000,bulkSize=20, schemaSampleSize=1

The connector is 1.6.3. My notebook is:

Scala 2.10 with Spark 1.6

I've tried restarting the kernel but that didn't help.

Other debug information:

Server Information:

You are using Jupyter notebook.

The version of the notebook server is 4.2.0 and is running on:
Python 2.7.11 (default, Jun 24 2016, 12:41:03) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]

Current Kernel Information:

IBM Spark Kernel

Update

I tried the following:

import sys.process._

"test -d ~/data/libs/scala-2.10" #|| "mkdir -p ~/data/libs/scala-2.10" !
"wget -c -O ~/data/libs/scala-2.10/cloudant-spark-v1.6.4-167.jar https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.4/cloudant-spark-v1.6.4-167.jar" !
"ls ~/data/libs/scala-2.10/" !

println("Now restart the kernel")

Unfortunately, this didn't work - 1.6.3 is still being used.

Update 2

It appears that the tilda was not getting resolved to my HOME folder in the above code.

See the answer for the working solution.


Solution

  • Running the following code from a scala notebook worked for me:

    import sys.process._
    
    val HOME = sys.env("HOME")
    val DESTDIR = s"${HOME}/data/libs/scala-2.10"
    
    s"test -d ${DESTDIR}" #|| s"mkdir -p ${DESTDIR}" !
    s"wget -q -c -O ${DESTDIR}/cloudant-spark-v1.6.4-167.jar https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.4/cloudant-spark-v1.6.4-167.jar" !
    s"ls ${DESTDIR}/" !
    

    I have also requested product management for the spark service to officially upgrade this library.