Search code examples
data-science-experience

How to add spark packages to Spark R notebook on DSX?


The spark documentation shows how a spark package can be added:

sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")

I believe this can only be used when initialising the session.

How can we add spark packages for SparkR using a notebook on DSX?


Solution

  • Please use pixiedust package manager to install the avro package.

    pixiedust.installPackage("com.databricks:spark-avro_2.11:3.0.0")

    http://datascience.ibm.com/docs/content/analyze-data/Package-Manager.html

    Install it from python 1.6 kernel since pixiedust is importable in python.(Remember this is install at your spark instance level). Once you install it , restart the kernel and then switch to R kernel and then read the avro like this:-

    df1 <- read.df("episodes.avro", source = "com.databricks.spark.avro", header = "true")

    head(df1)

    Complete Notebook:-

    https://github.com/charles2588/bluemixsparknotebooks/raw/master/R/sparkRPackageTest.ipynb

    Thanks, Charles.