I encounter errors while trying to index multiple documents in solr
with spark-streaming
using solrj
. Each record I parse and index, each micro-batch
.
In the code below, the first method (tagged) functions as expected. The second method (tagged) does not do anything, it does not event fail.
In the first option, I index a record for each partition; useless, but functional. In the second method, I convert each element of my partitions into a document and then try to index each of them, but fail: no records are showing in the collection.
I use solrj 4.10
and spark-2.2.1
.
//method 1
myDStream.foreachRDD { rdd => rdd.foreachPartition { records =>
val solrServer = new HttpSolrServer(collectionUrl)
val document = new SolrInputDocument()
document.addField("key", "someValue")
...
solrServer.add(document)
solrServer.commit()
}}
//method 2
myDStream.foreachRDD { rdd => rdd.foreachPartition { records =>
val solrServer = new HttpSolrServer(collectionUrl)
records.map { record =>
val document = new SolrInputDocument()
document.addField("key", record.key)
...
solrServer.add(document)
solrServer.commit()
}
}}
I would like to understand why the second method does not work and find a solution to multiple documents indexing.
The solution was to process the records through rdd
s:
myDStream.foreachRDD { rdd => rdd.foreach { record =>
val solrServer = new HttpSolrServer(collectionUrl)
val document = new SolrInputDocument()
document.addField("key", record.key)
...
solrServer.add(document)
solrServer.commit()
}}
See EricLavault's comment above for more information about the issue source suspicion.