Search code examples
xmlhttpxsltsolrsearch-engine

Solr - Upload XML file, process with XSLT and supply extra field values in URL


I'm trying to upload an XML document (RSS feed) to Solr. I call this to index the file

curl "http://localhost:8983/solr/1-3/update?commit=true&commitWithin=10000&tr=updateXml.xsl&literalsOverride=true&literal.client_uid=3" -H "Content-Type: text/xml" --data-binary @myfile.xml

The core name is 1-3, it processes the file correctly and I can search all the products and fields I have specified in the schema.xml when I don't include the client_uid in the schema or make it an optional field.

This is an extra field that I'd like to include in the URL (documents on their own don't have this value)

<field name="client_uid" type="long" indexed="true" stored="true" multiValued="false" required="true"/>

My file has around 22,000 documents in it. I try to supply the value via the literal.client_uid parameter in the URL but I'm getting this error.

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">3007</int></lst><lst name="error"><str name="msg">[doc=117755] missing required field: client_uid</str><int name="code">400</int></lst>
</response>

I'm using Solr 5.4.0

What is wrong?


Solution

  • Figured it out. As @Karsten R. explained it won't work because the request handlers are different and the UpdateRequestHandler doesn't support it.

    I have decided to use an updateRequestProcessorChain (in solrconfig.xml) and created a .jar library with a new UpdateRequestProcessorFactory class which I included in the processor chain.

    Snapshot from solrconfig.xml

    <updateRequestProcessorChain name="mychain">
      <processor class="mypackage.solr.MyNewProcessorFactory"/>
      <processor class="solr.LogUpdateProcessorFactory" />
      <processor class="solr.RunUpdateProcessorFactory" />
    </updateRequestProcessorChain>`
    

    Code for the Solr plugin (this .jar file goes into lib folder where solr.xml is - you need to create the lib folder yourself first time)

    package dreamagility.solr;
    
    import java.io.IOException;
    
    import org.apache.solr.common.SolrInputDocument;
    import org.apache.solr.common.params.SolrParams;
    import org.apache.solr.request.SolrQueryRequest;
    import org.apache.solr.response.SolrQueryResponse;
    import org.apache.solr.update.AddUpdateCommand;
    import org.apache.solr.update.processor.UpdateRequestProcessor;
    import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
    
    /**
     * Created by Daniel on 06/01/2016.
     *
     * Adds extra tags to each document to be able to filter based on the client id it belongs to
     * This is something that is not included as a part of the feed which is indexed but it will be supplied with
     * the URL as a parameter.
     */
    public class MyNewProcessorFactory extends UpdateRequestProcessorFactory {
    
        @Override
        public UpdateRequestProcessor getInstance(SolrQueryRequest solrQueryRequest, SolrQueryResponse solrQueryResponse, UpdateRequestProcessor updateRequestProcessor) {
            return new MyNewProcessorFactory(solrQueryRequest, solrQueryResponse, updateRequestProcessor);
        }
    }
    
    class MyNewProcessorFactoryextends UpdateRequestProcessor {
        private SolrQueryRequest solrQueryRequest;
        private SolrQueryResponse solrQueryResponse;
        private UpdateRequestProcessor updateRequestProcessor;
    
        public MyNewProcessorFactory(SolrQueryRequest _solrQueryRequest, SolrQueryResponse _solrQueryResponse, UpdateRequestProcessor _updateRequestProcessor) {
            super(_updateRequestProcessor);
    
            this.solrQueryRequest = _solrQueryRequest;
            this.solrQueryResponse = _solrQueryResponse;
            this.updateRequestProcessor = _updateRequestProcessor;
        }
    
        @Override
        public void processAdd(AddUpdateCommand cmd) throws IOException {
            SolrInputDocument document = cmd.getSolrInputDocument();
            SolrParams params = this.solrQueryRequest.getParams();
    
            int clientId = params.getInt("clientId");
    
            document.addField("client_uid", clientId);
    
            super.processAdd(cmd);
        }
    }
    

    And my HTTP call looks like this

    curl "http://localhost:8983/solr/1-3/update?commit=true&commitWithin=10000&tr=updateXml.xsl&overwrite=true&clientId=3update.chain=mychain" -H "Content-Type: text/xml" --data-binary @myfile.xml