Search code examples
marklogicmarklogic-8mlcp

Marklogic Content Pump generate multiple documents through XSLT transform


This is the second question related to MarkLogic content pump utility.

I am ingesting a single aggregated XML document with multiple records into MarkLogic Content pump. I expect the the aggregate XML document to be transformed to a different format and also the content pump utility to generate multiple xml document from a single input large xml document.?

Example: Aggregated input xml document:

<root>
 <data>Bob</data>
 <data>Vishal></data>
</root>

Expected Output from content pump : Two documents with a different format:

Document 1 :

<data1>Bob</data1>

Document 2

<data1>Vishal</data1>

I am using following XSLT to split the above document into two nodes:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">
  <xsl:template match="root">
    <xsl:apply-templates select="data"></xsl:apply-templates>
  </xsl:template>
  <xsl:template match="data">
    <data1><xsl:value-of select="."/></data1>
  </xsl:template>
</xsl:stylesheet>

output:

<?xml version="1.0" encoding="UTF-8"?>
<data1>Bob</data1>
<data1>Vishal</data1>

Following is the XQuery transform, which calls the above the "XSLT file" to generate two nodes:

xquery version "1.0-ml";
module namespace example = "http://marklogic.com/example";

declare function example:transform(
  $content as map:map,
  $context as map:map
) as map:map*
{
  let $attr-value := 
    (map:get($context, "transform_param"), "UNDEFINED")[1]
  let $the-doc := map:get($content, "value")

  let $let-output:=  xdmp:xslt-invoke("/marklogic.rest.transform/simple-xsl/assets/transform.xsl", $the-doc )
  return (map:put(
          $content, "value",
          $let-output
        ),$content)

};

The above XQuery transforms fails and returns a error. So, how do I modify the above XQuery program so that it generates and indexes multiple transformed XML documents from a single document?

MLCP Command:

mlcp.sh import -host localhost -port 8040 \
    -username admin -password admin \
    -input_file_path ./parent-form.xml \
    -transform_module /example/parent-transform.xqy \
    -transform_namespace "http://marklogic.com/example" \
    -transform_param "my-value" \
    -output_collections people \
    -output_permissions my-app-role,read,my-app-role,update 

Solution

  • The transform you provided returns a single document containing multiple root elements. The transform will work, but MarkLogic will not allow inserting that into the database, and throw an XDMP-MULTIROOT: Document nodes cannot have multiple roots.

    There are two ways to solve that. The simplest is to use /* behind the xdmp:xslt-invoke. The other solution is to use <xsl:result-document href="{generate-id()}.xml"> inside your XSLT. Both will cause $let-output to contain a sequence instead of just a single document.

    However, without further changes that will result in XDMP-CONFLICTINGUPDATES, as this would write multiple results at one database uri. To solve that you can clone the $content map:map with a small trick, and provide separate uris. For instance like this:

    for $let-output at $i in xdmp:xslt-invoke("/marklogic.rest.transform/simple-xsl/assets/transform.xsl", $the-doc )/*
    let $extra-content := map:map(document{$content}/*)
    let $_ := map:put($extra-content, "value", $let-output)
    let $_ := map:put($extra-content, "uri", concat($the-uri, '-', $i, '.xml') )
    return
      $extra-content
    

    Note: the transform function has a return type of map:map*, meaning you can return zero or more map:map's containing result.

    HTH!