Search code examples
apache-nifimarklogicmarklogic-10

Is there any way to limit the number of documents returned by a structured query?


I'm using the NiFi QueryMarkLogic processor to fetch a collection with 600,000 documents and getting a cryptic error. I'm using a structured cts:collection-query. My hunch is that the amount of data is just too large, so I'd like to fetch less than the full set of documents. In Query Console I could just do something like fn:subsequence(cts:search(fn:doc(), cts:collection-query("My Collection")), 1, 1000), but I can't find any way to do the equivalent in a structured query or even a combined query.

Edit: The specific error I'm getting is:

org.apache.nifi.processor.exception.ProcessException: 
org.apache.nifi.processor.exception.FlowFileHandlingException: 
StandardFlowFileRecord[uuid=0461...] transfer relationship not specified. 
This FlowFile was created in this session and was not transferred to any Relationship 
via ProcessSession.transfer()

Solution

  • If that processor does not do what you want, you can always fall back to the ExecuteScriptMarkLogic processor.

    Please note that over recent times, all of the MarkLogic NiFI processors were re-worked and standardized. So if you are using an old version, I suggest that you upgrade.

    Related to what you are trying to do, I have done something similar in the past:

    1. Use ExecuteScriptMarkLogic to segment my data into collection - returning the collection names.

    2. Pass those names individually to the queryMarkLogicProcessor.

    However, much of the NiFI code for Apache handles batches nicely internally, so I'd check the newest docs closely first.