Search code examples
solrfull-text-searchrequesthandler

Solr Custom RequestHandler - optimizing results


Yet another potentially embarrassing question. Please feel free to point any obvious solution that may have been overlooked - I have searched for solutions previously and found nothing, but sometimes it's a matter of choosing the wrong keywords to search for.
Here's the situation: coded my own RequestHandler a few months ago for an enterprise-y system, in order to inject a few necessary security parameters as an extra filter in all queries made to the solr core. Everything runs smoothly until the part where the docs resulting from a query to the index are collected and then returned to the user.

Basically after the filter is created and the query is executed we get a set of document ids (and scores), but then we have to iterate through the ids in order to build the result set, one hit at a time - which is a good 10x slower that querying the standard requesthandler, and only bound to get worse as the number of results increase. Even worse, since our schema heavily relies on dynamic fields for flexibility, there is no way (that I know of) of previously retrieving the list of fields to retrieve per document, other than testing all possible combinations per doc.

The code below is a simplified version of the one running in production, for querying the SolrIndexSearcher and building the response.

Without further ado, my questions are:

  • is there any way of retrieving all results at once, instead of building a response document by document?
  • is there any possibility of getting the list of fields on each result, instead of testing all possible combinations?
  • any particular WTFs in this code that I should be aware of? Feel free to kick me!
//function that queries index and handles results
private void searchCore(SolrIndexSearcher searcher, Query query, 
        Filter filter, int num, SolrDocumentList results) {  

    //Executes the query
    TopDocs col = searcher.search(query,filter, num);

    //results
    ScoreDoc[] docs =  col.scoreDocs;        

    //iterate & build documents
    for (ScoreDoc hit : docs) {
        Document doc = reader.document(hit.doc);
        SolrDocument sdoc = new SolrDocument();

        for(Object f : doc.getFields()) {
            Field fd = ((Field) f);

            //strings
            if (fd.isStored() && (fd.stringValue() != null))
                sdoc.addField(fd.name(), fd.stringValue());
            else if(fd.isStored()) {
                //Dynamic Longs
                if (fd.name().matches(".*_l") ) {
                    ByteBuffer a = ByteBuffer.wrap(fd.getBinaryValue(), 
                            fd.getBinaryOffset(), fd.getBinaryLength());
                    long testLong = a.getLong(0);
                    sdoc.addField(fd.name(), testLong );
                }
                //Dynamic Dates
                else if(fd.name().matches(".*_dt")) {
                    ByteBuffer a = ByteBuffer.wrap(fd.getBinaryValue(), 
                        fd.getBinaryOffset(), fd.getBinaryLength());
                    Date dt = new Date(a.getLong());
                    sdoc.addField(fd.name(), dt );
                }
                //...
            }                 
        }
        results.add(sdoc);
    }
}  


Solution

  • Per OPs request:

    Although this doesn't answer your specific question, I would suggest another option to solve your problem.

    To add a Filter to all queries, you can add an "appends" section to the StandardRequestHandler in the SolrConfig.xml file. Add a "fl" (stands for filter) section and add your filter. Every request piped through the StandardRequestHandler will have the filter appended to it automatically.

    This filter is treated like any other, so it is cached in the FilterCache. The result is fairly fast filtering (through docIds) at query time. This may allow you to avoid having to pull the individual documents in your solution to apply the filtering criteria.