Search code examples
marklogicmarklogic-8mlcp

Marklogic Content Pump and XSLT transformation


I am using MarkLogic Content Pump to ingest XML documents. I would like to transform these xml documents in the mlcp ingestion process using “-tranform module and -transform namespace” option. I have already created the XSLT for the transformation and also loaded it into ML “modules" database. But mlcp is not accepting xslt file and throwing error:

COMMAND:

    mlcp.sh import \
-username $username -password $passwd \
-host $host -port $port \
-input_file_path $inpath \
-input_compressed true \
-input_file_type aggregates \
-aggregate_record_element $splittag \
-aggregate_uri_id $uriid \
-aggregate_record_namespace "http://www.fda.gov/cdrh/gudid" \
-output_collections $collection \
-output_permissions my-app-role,read,my-app-role,update \
-output_uri_suffix .xml \
-transform_module /marklogic.rest.transform/xml-transform-xsl/assets/transform.xsl \
-transform_namespace "http://marklogic.com/rest-api/transform/xml-transform-xsl" \
-transform_function transform

Below error is thrown ERROR:

15/09/27 15:34:19 WARN mapreduce.ContentWriter: XDMP-MODNOTTEXT: Module /marklogic.rest.transform/fda-transform-xsl/assets/transform.xsl is not a text document

I would like to know whether xslt transformation is accepted by mlcp? If not then what is the alternative.?

MarkLogic creating equivalent xqy file in modules database. By calling below ".xqy" file, parameter mismatch error will be thrown: I think this is due to wrong return type:

xquery version "1.0-ml";
module namespace simple-xsl = "http://marklogic.com/rest-api/transform/simple-xsl";
import module namespace extut = "http://marklogic.com/rest-api/lib/extensions-util"
    at "/MarkLogic/rest-api/lib/extensions-util.xqy";
declare namespace xsl = "http://www.w3.org/1999/XSL/Transform";
declare default function namespace "http://www.w3.org/2005/xpath-functions";
declare option xdmp:mapping "false";
declare private variable $transform-uri := "/marklogic.rest.transform/fda-transform-xsl/assets/transform.xsl";
declare function fda-transform-xsl:transform(
    $context as map:map,
    $params  as map:map,
    $content as document-node()  
) as document-node()?
{
    extut:execute-transform($transform-uri,$context,$params,$content)
};

Solution

  • I don't think you can point Content Pump's -transform_module directly at an XSLT. I think it expects an xQuery module (cf. https://docs.marklogic.com/guide/ingestion/content-pump#id_82518).

    You should be able to set up such a custom transform xQuery module and call your XSLT transform from in there via an xdmp:xslt-invoke() on the $content map that Content Pump passes in (cf. http://docs.marklogic.com/xdmp:xslt-invoke). You would then set -transform_module to point to that custom transfer xQuery module rather than directly calling the XSL transform.

    Note that if you use -input_file_type aggregates, as in your example, your custom transform will be applied to each fragment as defined per $splittag. So the incoming $content map will be the fragment you're splitting (and transforming) on.