Search code examples
marklogicmarklogic-9mlcpmarklogic-dhf

MarkLogic - Incremental load using MLCP


MarkLogic version : 9.0-6.2

We are trying to use mlcp to load daily changes of customer data into data-hub-STAGING and then use a harmonize flow to bring changes into data-hub-FINAL.

As I understand, the 'collector.sjs' is used to return the uris that needs to be harmonized. After the full load on day 1, is there a way the collector can identify changes from the previous day and harmonize only those rows?

I have couple of designs

  1. Save the batch run time with every run and have the logic in collector to return uris that have the higher batch run time (each document is saved with batch run time)

  2. Save each document to two collections (customer and customer_currentDate) and then have the collector return documents from customer_currentDate collection. However, this will have a problem if the ingest and harmonize run on different days

  3. Save each document to two collections (customer and customer_batchDateTime) and then create a marker (something like a row in the PROCESS collection with PROCESS_IND as 'N'). Then the collector would sweep through PROCESS collection and look for documents with PROCESS_IND as 'N' and return documents from customer_batchDateTime collection. Finally, the writer.sjs would turn PROCESS_IND to 'Y'

Before proceeding with any of the above options, I just want to check if there is any in-built capability in ingest or harmonize process to easily identify the delta/change records, so that I am not over-engineering the process.


Solution

  • I think the "built-in" capability is the ability to pass options to the collector module. This allows you to choose the best approach for restricting what the collector returns.

    For your use case, it sounds like the best/simplest approach would be to insert all of the documents each day into a collection named e.g. "input-(current date)", and then feed that as an option into the collector module so that it can apply a collection query.