Search code examples
marklogicmarklogic-9mlcp

How to remove a column from a csv file while loading a file?


I want to remove the particular column from the csv file and load it into database using mlcp.

My csv file contains:

URI,EmpId,Name,age,gender,salary
1/Niranjan,1,Niranjan,35,M,1000
2/Deepan,2,Deepan,25,M,2000
3/Mehul,3,Mehul,28,M,3000

I want to use that URI column as the uri for the document and also that uri column should be skipped/removed in the inserted document.

How to do it??


Solution

  • Your best bet when using MLCP and not in MarkLogic Data Hub context is using MLCP tranforms. You can find some explanation, and a few examples here:

    Transforming Content During Ingestion

    In case you are converting your CSV to JSON, you could use something like the following..

    Save this as /strip-columns.sjs in your modules database:

    /* jshint node: true */
    /* global xdmp */
    
    exports.transform = function(content, context) {
      'use strict';
    
      /* jshint camelcase: false */
      var stripColumns = (context.transform_param !== undefined) ? context.transform_param.split(/,/) : [];
      /* jshint camelcase: true */
    
      // detect JSON, assumes uri has correct extension
      if (xdmp.uriFormat(content.uri) === 'json') {
    
        // Convert input to mutable object for manipulation
        var newDoc = content.value.toObject();
        Object.keys(newDoc)
        .map(function(key) {
          if (stripColumns.indexOf(key) > -1) {
            delete newDoc[key];
          }
        });
    
        // Convert result back into a document
        content.value = newDoc;
    
      }
    
      // return updated content object
      return content;
    };
    

    And then you'd invoke it with something like this:

    mlcp.sh import -input_file_path test.csv -input_file_type delimited_text -uri_id URI -document_type json -output_uri_prefix / -output_uri_suffix .json -output_collections data,type/csv,format/json -output_permissions app-user,read -transform_module /strip-columns.sjs -transform_param URI
    

    HTH!