I need to continuously move the document from ML content db to Azure blob storage. I am planning to use nifi to accomplish that.
The problem is how to process the input FlowFile to add a new collection in ML content db. Hence, it could only process the content db once. I am thinking of using ExecuteScriptMarkLogic processor. However, I could not figure out how to get the input FlowFile uri.
I am thinking to add a new collection name like "Processed" to the document in ML. Hence the QueryMarkLogic processor will not process it going forward.
xdmp:document-add-collections($uri, "Processed")
https://marklogic.github.io/nifi/getting-started
How to do that?
A good technique with NiFi is to add a LogAttribute processor and log the payload and all attributes of the message. If you configure QueryMarkLogic to return documents and/or documents + metadata, then the URI of the document will be in the "filename" attribute of a FlowFile.
If you're updating the document so that QueryMarkLogic won't retrieve it again, I recommend instead modifying your query to constrain on some timestamp in your data, if one exists.