I've been trying to follow along things blog post:
Using spark 2.1 with built in Hadoop 2.7 run locally I can save a model:
However if I try to load the model from a regular scala (sbt) shell hdfs fails to load.
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.ml.{PipelineModel, Predictor}
val sc = new SparkContext(new SparkConf().setMaster("local[1]").setAppName("myApp"))
val model = PipelineModel.load("mymodel.model")
I get this is error:
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.DistributedFileSystem could not be instantiated
Is it in fact possible to use a spark model without calling spark-submit, or spark-shell? The article I linked to was the only one I'd seen mentioning such functionality.
My build.sbt is using the following dependencies:
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" % "spark-sql_2.11" % "2.1.0",
"org.apache.spark" % "spark-hive_2.11" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.11" % "2.1.0",
"org.apache.hadoop" % "hadoop-hdfs" % "2.7.0"
In both cases I am using Scala 2.11.8.
Edit: Okay it looks including this was the source of the problem
"org.apache.hadoop" % "hadoop-hdfs" % "2.7.0"
I removed that line and the problem went away
Also if your model is saved locally, you can remove hdfs in your configuration. This should prevent spark from attempting to instantiate hdfs.