Search code examples
marklogicmarklogic-dhf

How to abort some but not all ingest flows in MarkLogic Data Hub Framework


We are working with the MarkLogic Data Hub Framework and ingesting documents in a unitemporal database via the REST multi-document write documents endpoint.

Now, sometimes we receive document updates via this way of documents that didn't change. Obviously, in that case we do not want to add these documents in MarkLogic because of the unitemporal character that will result in flawed timestamps and unnecessary storage space.

We have written some code to detect duplicates (using hashing), however, we do not know how we can abort the ingestion of a duplicate document while non-duplicated documents in the same request are processed. That is, when a single request containing both non-duplicate and duplicate documents how can we prevent writing only the non-duplicates. The Data Hub Framework does not have any plugins to modify the document writing (as this is controlled by the REST api). We tried to throw an fn:error() in the content-plugin but unfortunately that aborts the whole multi-document write instead of only the writes for those document that result in an error.


Solution

  • We eventually discussed this with a MarkLogic Solution Architect and the conclusion is that this is not possible with the default v1/documents api.

    What we did to resolve this was to write our own custom api as part of the v1/resources. This api just calls the data hub framework code and then writes the documents if they are not duplicates.