Search code examples
marklogicmarklogic-8

CPF and task server in MarkLogic 8


I created a CPF on insert and update of documents. These CPF pipelines create multiple xdmp:spawn-tasks to perform variety of tasks. I have couple of questions with this approach.

  1. Some of the spawned tasks will modify the original document. Will this trigger the update workflow of CPF? I can use a flag on the document that will indicate that it is an update from the spawned task. But is there a more elegant way to do this ?
  2. Do I need to worry about deadlock? I mean if two tasks that were spawned from same CPF try to update the same document same time, how can I avoid this?

Basically I am trying to use envelope pattern for my inserted documents and wrap all the artifact documents into one single document. The reason I am using CPF to generate this artifact documents is that I can dump documents using MLCP or any other way into the database, and let CPF worry about the processing, instead of using a custom REST endpoint and having all the document ingest through this custom REST endpoint.


Solution

  • I would recommend not spawning tasks to apply updates to the document at hand. Such updates would interfere with the natural flow of CPF states. CPF itself is designed to take a document through multiple states, and have each transition contribute to the total transformation.

    So, let the document go through multiple CPF states, and do your updates in one of more of such state transitions (pipelines). Each state transition is already a separate transaction, so no need to spawn tasks.

    I don't think you need to worry about deadlocks, but I can easily imagine that parallel execution of updates will cause part of the updates to get lost as they might get overwritten by each other. That kind of depends on the exact code you are using. MarkLogic's transaction mechanism normally guards against that, but it is easy to write code that bypasses that mechanism.

    HTH!