Search code examples
marklogicmlcpmarklogic-corb

Difference between CORB and MLCP MarkLogic


Are there any differences between CORB and MLCP MarkLogic?

I see they do the same kind of job. In what scenarios you use this vs that?


Solution

  • CoRB and MLCP are both Java based tools that communicate with MarkLogic via the XCC protocol.

    There is a lot of overlap in functionality. They can both be used to load data into the database, perform bulk transformations of documents, and to export data and generate reports.

    • MLCP is a supported product offering that is produced by MarkLogic
    • CoRB is an open source community effort

    MLCP knows how to produce and consume MarkLogic Archive and makes it easy to copy data between clusters.

    CoRB provides a lot of pre-built functionality, but it is also possible to customize behaviors by "plugging in" your own Java tasks or XQuery/JavaScript modules instead of using the pre-built ones that are provided.

    Both provide an engine for executing bulk tasks to work with MarkLogic, customizable by properties and commandline switches, and supplying custom JavaScript or XQuery modules.

    In many cases, either tool can be used to accomplish the work and it is just a matter of personal preference or expertise.

    A high level overview of features to show some similarities and differences

    CoRB MLCP
    Uses XCC protocol
    Java based
    Commandline utility
    Execute XQuery modules
    Execute JavaScript modules
    Execute custom Java tasks
    Multiple customizable stages of processing
    Import from CSV
    Import files from directory
    Import files from zip
    Import XML file (splitting into multiple documents)
    Import MarkLogic Archive
    Export MarkLogic Archive
    Bulk reprocess database records
    Produce CSV
    Dedup and sort exported text file
    Export documents
    Export as zip
    Bulk Schema validation
    Web UI and endpoints to display status and dynamically adjust threads or pause/resume jobs
    Manually adjust threads or pause/resume jobs
    Auto-scaling to adjust threads
    MarkLogic supported product
    Apache 2 open source license