I'm using solr 4.10.2. I got an /export handler working to export large datasets. When I deployed the config into my solr cluster environment I noticed that the export function was missing some records.
If I ran the same query string through /select and /export I would get less records in the /export call.
Is there anything special you need to do to get the /export to work in a SolrCloud environment?
<requestHandler name="/export" class="solr.SearchHandler">
<lst name="invariants">
<str name="rq">{!xport}</str>
<str name="wt">xsort</str>
<str name="distrib">false</str>
</lst>
<arr name="components">
<str>query</str>
</arr>
</requestHandler>
I tried changing the "distrib" attribute to true hoping that would help, but that caused other errors.
Any suggestions?
The /export
endpoint is only relevant to the local node, but the Streaming Expressions API (available under /stream
without any further configuration) is built on top of the /export
endpoint and is meant to be the Cloud alternative.
This also allows you to process the content when requesting it, if applicable.
The required parameters for /stream
is the same as for the /export.
But since you're on 4.10.2, you're going to have to request the clusterstate.json from Zookeeper and then query each node by itself, before merging the results locally.
You can retrieve this file by connecting to Zookeeper:
zkCli.sh -server ip:2181
and then retrieve the clusterstate:
get /clusterstate.json
You'll find a list of shards and their replicas for each collection, and you can then iterate over those values and retrieve your results from the /export
handler on each server.