Search code examples
apache-nifimarklogic

Marklogic NIFI custom transform logic implmentation apprach comparison


It is related with this MarkLogic NIFI question. I think it might be useful to spin off a new question.

I am planning to use a "sync flag" collection on the document to implement a weekly changed sync between ML DB and a file folder. "Sync flag" will turn on whenever there is a change in the document. which way should I use to turn off that "Sync flag" with NIFI flow? and Why?

I feel all below 3 methods could work.

  1. Follow (https://docs.marklogic.com/guide/rest-dev/transforms) to use server trasform setting in QueryMarkLogic processor.
  2. Follow (https://docs.marklogic.com/guide/rest-dev/extensions) to implement a REST API Extension and use CallRestExtnsion processor.
  3. Write a simple xdmp:document-remove-collections XQuery script with ExecuteScript processor.

For method 1 and 2, will those two ML Nifi processor auto place the uri in the context, as NIFI processor will process the document one by one. so I guess it will call the back end (preloaded XQuery) for each matched document one by one.

For method 3, how to pass in the URI? or maybe it is not possible in that approach.

I want to have some comparisons to clarify my understanding.


Solution

  • #1 does not make sense to me. The transform in that context is about transforming content coming out of the query. There is also ApplyTransformMarkLogic - but I am not sure if that is just transform of the document or if it can change collections.

    #2 callRestExtension would be fine - however, you need to set query parameters in the processor config for the URI and then consume it on the REST endpoint code.

    #3 is simple. Have the URI in as an attribute. Then in the script body, reference it: let $uri := '${uri}' etc. Then use xdmp:document-remove-collections()