I am exploring MarkLogic database and have been trying to import data into it by using MarkLogic content pump. Here is the gist of the csv file.
firstname, middlename, lastname, address1, address2, city, state, zip, country
Rajath,,A,No 20 GN,16th cross,Bangalore,KA,560029,IN
Rajath1,,,No 75,,Dharwad,KA,560057,IN
Rajath2,,B,No 66,,Haveri,KA,560034,IN
Rajath3,,D,No 24A ,25th cross,Raichur,KA,560095,IN
Rajath4,,,No 36B,,Coorg,KA,,IN
I was successfully able to insert it in the DB. Here is the inserted document.
{
"firstname" : "Rajath4",
"middlename" : "",
"lastname" : "",
"address1" : "No 36B",
"address2" : "",
"city" : "Coorg",
"state" : "KA",
"zip" : "",
"country" : "IN"
}
It is even inserting the values which are blank/null. Is there a way to specify to ignore the fields which are blank/null while inserting into ML? Here is what I am expecting.
{
"firstname" : "Rajath4",
"address1" : "No 36B",
"city" : "Coorg",
"state" : "KA",
"country" : "IN"
}
And also how to maintain an auto-generated primary_key/sequence for each documents that we insert? If the id already exists, it should merge/update the document.
Thanks in advance.
You can use a custom transform for filtering the data with your own code.
Also MLCP has the ability to do a unique ID - but per run, not globally. But luckilly, the custom transform feature allows you to also change the URI if you like(so you provide the logic to make it unique) - addressing both of your challenges.
DOC: https://docs.marklogic.com/guide/mlcp.pdf
Relevant Sections:
- 4.17 - Custom Transforms
- 4.17.5 - Sample Transform (here add your code to filter the content. in your case, you may prefer to invoke a function in javascript. But that is all a personal choice.)
- 4.17.6 - Changing the URI
Fun note 4.17.6 also includes explaining how to change the document type. If you are an xslt person, yuou might decide to have MLCP provide XML, then a template to purge the empty elements and then transform and save to a json object at the end.
A note of caution: if you use MLCP with the fastload option, then I think changing the URI will negate the benefit of fastload(or something like that)