Search code examples
solrsolrjsolrcloud

How do you confiugre /export requestHandler in SolrCloud to use all shards


I'm using solr 4.10.2. I got an /export handler working to export large datasets. When I deployed the config into my solr cluster environment I noticed that the export function was missing some records.

If I ran the same query string through /select and /export I would get less records in the /export call.

Is there anything special you need to do to get the /export to work in a SolrCloud environment?

  <requestHandler name="/export" class="solr.SearchHandler">
    <lst name="invariants">
      <str name="rq">{!xport}</str>
      <str name="wt">xsort</str>
      <str name="distrib">false</str>
    </lst>

    <arr name="components">
      <str>query</str>
    </arr>
  </requestHandler>

I tried changing the "distrib" attribute to true hoping that would help, but that caused other errors.

Any suggestions?


Solution

  • The /export endpoint is only relevant to the local node, but the Streaming Expressions API (available under /stream without any further configuration) is built on top of the /export endpoint and is meant to be the Cloud alternative.

    This also allows you to process the content when requesting it, if applicable.

    The required parameters for /stream is the same as for the /export.

    But since you're on 4.10.2, you're going to have to request the clusterstate.json from Zookeeper and then query each node by itself, before merging the results locally.

    You can retrieve this file by connecting to Zookeeper:

    zkCli.sh -server ip:2181
    

    and then retrieve the clusterstate:

    get /clusterstate.json
    

    You'll find a list of shards and their replicas for each collection, and you can then iterate over those values and retrieve your results from the /export handler on each server.