Search code examples
apache-sparksolrsolrjlucidworks

Spark-Solr Connector trying to add already existing field with stored=true


I am using Spark-Solr connector 3.4.0 with Solr cloud version 7.6.0 in a Spark 2.2.1 Cluster . We have an existing Solr collection with a predefined schema for it. Most of the fields have the stored parameter set to true, but there are certain fields where we explicitly set stored=false. When we try to push data to Solr using the spark-solr connector, we get the following error-

org.apache.solr.api.ApiBag$ExceptionWithErrObject: error processing commands, errors: [{add-field={name=taxonomy, indexed=true, multiValued=true, docValues=true, stored=true, type=string},  errorMessages=[Field 'item_id_channel' already exists.
]}],
   at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:92)
   at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
   at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
   at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)

The error says the item_id_channel already exists, but this error is only raised for fields for which we have defined stored=false (in the Solr schema). I get that the connector wishes to create the schema again for some reason, but it sets the stored parameter to true which clashes with the predefined schema definition on Solr for this field.

My question is - Is there a way to tell the connector (probably through some option?) that we want the stored to be set to true for certain fields? And probably a generic way to define other solr parameters for the fields?


Solution

  • We found the issue that was causing the error. There was a bug in older versions of spark-solr connector, because of which the connector was trying to add existing fields to the solr schema in case the value of stored was true. This was fixed in 3.5.5 release. Hence, once we upgraded our connector to version 3.5.14, the ingestion stared working without any errors.