Search code examples
xquerymarklogic-corbsjs

How do I write multiple output files using CoRB?


By default when I run a CoRB job that returns data from the process function that data is streamed into a single file on the CoRB client. I have a problem where I need to write the output to different files, one file per URI that is being processed. How do I write CoRB outputs into multiple files instead of one large file?

I have a CoRB job that returns the URI today, and those URIs are streamed together into one output file with each URI on a new line. I would prefer to have a directory filled with files, and have one file per URI.


Solution

  • CoRB has two built-in Tasks that can be used to write the output of the PROCESS-MODULE to the filesystem.

    • ExportBatchToFileTask Generates a single file, typically used for reports. Writes the data returned by the PROCESS-MODULE to a single file specified by EXPORT-FILE-NAME. All returned values from entire CoRB will be streamed into the single file.
    • ExportToFileTask Generates multiple files. Saves the documents returned by each invocation of PROCESS-MODULE to a separate local file within EXPORT-FILE-DIR where the file name for each document will be the based on the URI.

    It is common for people to write CoRB jobs to generate a CSV and other reports that append the output of the PROCESS-MODULE execution into a single file. If you specify the EXPORT-FILE-NAME option, then CoRB will automatically use ExportBatchToFileTask by seting the PROCESS-TASK option for you (unless you have explicitly set the PROCESS-TASK option):

    PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
    

    However, if you would prefer to have the results of each process module execution saved as it's own output file, for a multi-threaded download/export, then you would want to configure the ExportToFileTask. It will use the URI sent to the process module to construct a directory structure an filename, and save the results of the transform to that file path.

    You can set the EXPORT-FILE-DIR to provide a base directory in which to write out those files.

    So, to configure CoRB to write the results of each PROCESS-MODULE execution to it's own file, you would want to have the following options set for your CoRB job:

    PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
    EXPORT-FILE-DIR=/tmp/export