By default when I run a CoRB job that returns data from the process function that data is streamed into a single file on the CoRB client. I have a problem where I need to write the output to different files, one file per URI that is being processed. How do I write CoRB outputs into multiple files instead of one large file?
I have a CoRB job that returns the URI today, and those URIs are streamed together into one output file with each URI on a new line. I would prefer to have a directory filled with files, and have one file per URI.
CoRB has two built-in Tasks that can be used to write the output of the PROCESS-MODULE to the filesystem.
ExportBatchToFileTask
Generates a single file, typically used for reports. Writes the data returned by the PROCESS-MODULE to a single file specified by EXPORT-FILE-NAME. All returned values from entire CoRB will be streamed into the single file.ExportToFileTask
Generates multiple files. Saves the documents returned by each invocation of PROCESS-MODULE to a separate local file within EXPORT-FILE-DIR where the file name for each document will be the based on the URI.It is common for people to write CoRB jobs to generate a CSV and other reports that append the output of the PROCESS-MODULE execution into a single file. If you specify the EXPORT-FILE-NAME option, then CoRB will automatically use ExportBatchToFileTask by seting the PROCESS-TASK option for you (unless you have explicitly set the PROCESS-TASK option):
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
However, if you would prefer to have the results of each process module execution saved as it's own output file, for a multi-threaded download/export, then you would want to configure the ExportToFileTask
. It will use the URI sent to the process module to construct a directory structure an filename, and save the results of the transform to that file path.
You can set the EXPORT-FILE-DIR to provide a base directory in which to write out those files.
So, to configure CoRB to write the results of each PROCESS-MODULE execution to it's own file, you would want to have the following options set for your CoRB job:
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-DIR=/tmp/export