I am trying to index data from spark shell to solr. My solr is deployed in local mode.
I know that doing the same for cloud mode can be done with :
var collection_name="new_core"
var zk_host = "solr1:2181,solr2:2181,solr3:2181"
val options = Map(
"zkhost" -> zk_host,
"collection" -> collection_name
)
df.write.format("solr").options(options).mode(org.apache.spark.sql.SaveMode.Overwrite).save();
However, I am not able to replicate this for local mode.
what i have tried:
var corename="new_core"
var zk_host = "localhost:2181"
val options = Map(
"zkhost" -> zk_host,
"collection" -> corename
)
df.write.format("solr").options(options).mode(org.apache.spark.sql.SaveMode.Overwrite).save();
Does not work! Please suggest some solution.
I was able to index data from on local solr. The df.write method did not work, however, I found an alternate method.
/opt/solr-7.2.0/bin/./post -c new_core /path/to/file/for/indexing
It works on command line not on spark-shell. If you want to run it through spark shell, do:
var df = spark.read.load("path to file")
df2.write.format("csv").option("header",true).save("somehadooppath/myfile")
import sys.process._;
"hadoop fs -get somehadooppath/myfile/ somelocalpath/myfile".!
"/opt/solr-7.2.0/bin/./post -c <new core name> somelocalpath/myfile ".!