Search code examples
scalaapache-sparkindexingsolr

Index data in solr (local mode) from spark shell


I am trying to index data from spark shell to solr. My solr is deployed in local mode.

I know that doing the same for cloud mode can be done with :

var collection_name="new_core"
var zk_host = "solr1:2181,solr2:2181,solr3:2181"
val options = Map(
        "zkhost" -> zk_host,
        "collection" -> collection_name
      )
df.write.format("solr").options(options).mode(org.apache.spark.sql.SaveMode.Overwrite).save();

However, I am not able to replicate this for local mode.

what i have tried:

var corename="new_core"
var zk_host = "localhost:2181"
val options = Map(
        "zkhost" -> zk_host,
        "collection" -> corename
      )
df.write.format("solr").options(options).mode(org.apache.spark.sql.SaveMode.Overwrite).save();

Does not work! Please suggest some solution.


Solution

  • I was able to index data from on local solr. The df.write method did not work, however, I found an alternate method.

    /opt/solr-7.2.0/bin/./post -c new_core /path/to/file/for/indexing
    

    It works on command line not on spark-shell. If you want to run it through spark shell, do:

    var df = spark.read.load("path to file")
    
    df2.write.format("csv").option("header",true).save("somehadooppath/myfile")
    
    import  sys.process._;
    
    "hadoop fs -get somehadooppath/myfile/ somelocalpath/myfile".!
    "/opt/solr-7.2.0/bin/./post -c <new core name> somelocalpath/myfile ".!