Search code examples
marklogicmarklogic-corb

CORB writing less files in CSV?


I am running CORB for converting my data to CSV . When i am running with THREAD-COUNT-1 it is working perfectly fine means i am getting all the file outputs in CSV file. But when i increased the thread-size and Batch-size it is showing less number of output files in my CSV file. I dont know why?

Below is my Properties file

THREAD-COUNT=5
BATCH-SIZE=10
URIS-MODULE=selector.sjs|ADHOC
PROCESS-MODULE=transform.sjs|ADHOC
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-NAME=HelloWorldReport.csv
PRE-BATCH-TASK=com.marklogic.developer.corb.PreBatchUpdateFileTask
EXPORT-FILE-TOP-CONTENT=a,b,c,d,e,uri

But in the CORB command prompt i can see all the uris . But when writing into CSV it is returing only few.

I had followed this documentation for setting up my selector and transform module. Below is my selector.sjs module

var total = cts.uris("", null, cts.collectionQuery("data"));
fn.insertBefore(total,0,fn.count(total))

In my transform.sjs i am getting elements from my documents and then i am concatenating them

var name = fn.tokenize(URI, ";");
for ( var uri of name) {
let obj = fn.head(fn.doc(uri)).toObject();
var a = obj.Name;
var b = obj.Country;
var c = obj.State;
var d = obj.Code;
var e = obj.University;
fn.concat(a,b,c,d,e,uri);
}

And also is there any function in Marklogic for keeping a delimiter in between values(i.e.Means in the above fn.concat i am concatenating all the strings(abcdeuri) but what i want is to delimit each of them with ,(a,b,c,d,e,uri). I tried usingfn.stringJoin but i can't send more than three values into it )

Any help is appreciated.

Thanks


Solution

  • The issue is that only the last evaluated expression is returned from a JavaScript module. You are generating strings inside of the for loop, so when you set the BATCH-SIZE greater than 1, only the last item from your for loop is being returned.

    You could increase your THREAD-COUNT and keep the BATCH-SIZE=1 and should get the desired output without changing the process module.

    In order for your process module to return the desired results with a BATCH-SIZE greater than 1, you need to collect the results as you process the data inside of your for loop, and then return all of the data outside of the for loop. You can collect the data by pushing into an Array variable and then return a Sequence of strings using Sequence.from().

    You can use the fn.stringJoin() function to produce a CSV. The first parameter is the sequence of values, which you can put into an array, and the second parameter is the value to join with.

    var URI;
    var name = fn.tokenize(URI, ";");
    var results = [];
    for ( var uri of name) {
    let obj = fn.head(fn.doc(uri)).toObject();
    var a = obj.Name;
    var b = obj.Country;
    var c = obj.State;
    var d = obj.Code;
    var e = obj.University;
    results.push(fn.stringJoin([a,b,c,d,e,uri], ","));
    }
    Sequence.from(results);