Search code examples
csvurimarklogicmarklogic-8marklogic-dhf

MARKLOGIC: Is it possible to use more than 1 columns from a CSV file when generating URI ID during data ingestion in MarkLogic?


I am quite new to MarkLogic and I am not sure how to best deal with the challenge I have right now.

I have a CSV file exported from a table that will be ingested to MarkLogic database. Now the source table uses 4 columns as its unique primary key combination.

In MarkLogic, by default, only one column from CSV file can be used as the URI ID.

My question is, is it possible to use more than 1 columns from a CSV file as the URI ID during data ingestion in MarkLogic? If yes, is this feature or setting available in data hub? If it is not possible, what is usually the best practice for this in MarkLogic?

I know that one possible work around is to create a new column combining the data from 4 primary key columns and use it as the URI ID.


Solution

  • You can use MLCP Transforms to transform both the content value, and the uri. It gets a hash object $content containing both. Update its values as desired, and return the updated hash object. Something like:

    declare function example:transform(
      $content as map:map,
      $context as map:map
    ) as map:map*
    {
      let $record := map:get($content, "value")
      let $uri := $record/prop1 || $record/prop2 || $record/prop3 
      let $_ := map:put($content, "uri", $uri)
      return $content
    };
    

    You can use such MLCP transforms in marklogic-data-hub as well.

    HTH!