Search code examples
javamarklogic

Use "Selector modules" with DataMovement SDK MarkLogic [Java] [MarkLogic] [dmsdk] [data-movement-sdk][ml-java-api]


I'm using Data Movement SDK from MarkLogic Java API to transform several documents, up to now I can transform documents by using a query batcher and a transform, but i'm only able to use URIS selectors by StructuredQuery objects.

My question is: ¿How may I use a selector module from my database instead of define it into my java application?

Update: Up to now I already have a code that looks for document's URIS and applies a transform on them. I want to change that query batcher and use a module or selector module instead of looking for all documents into a directory

public TransformExecutionResults applyTransformByModule(String transformName, String filterText, int batchSize, int threadCount, String selectorModuleName, Map<String,String> parameters ) {
    final ConcurrentHashMap<String, TransformExecutionResults> transformResult = new ConcurrentHashMap<>();

    try {
        // Specify a server-side transformation module (stored procedure) by name
        ServerTransform transform = new ServerTransform(transformName);
        ApplyTransformListener transformListener = new ApplyTransformListener().withTransform(transform).withApplyResult(ApplyResult.REPLACE) // Transform in-place, i.e. rewrite
                .onSuccess(batch -> {
                    transformResult.compute(transformName, (k, v) -> TransformExecutionResults.Success);
                    System.out.println("Transformation " + transformName + " executed succesfully.");
                }).onSkipped(batch -> {
                    System.out.println("Transformation " + transformName + " skipped succesfully.");
                    transformResult.compute(transformName, (k, v) -> TransformExecutionResults.Skipped);
                }).onFailure((batchListener, throwable) -> {
                    System.err.println("Transformation " + transformName + " executed with errors.");
                    transformResult.compute(transformName, (k, v) -> TransformExecutionResults.Failed); // failed
                });

        // Apply the transformation to only the documents that match a query.
        QueryManager qm = DbClient.newQueryManager();
        StructuredQueryBuilder sqb = qm.newStructuredQueryBuilder();

        // instead of this StruturedQueryDefinition, I want to use a module to get all URIS
        StructuredQueryDefinition queryBySubdirectory = sqb.directory(true, "/temp/" + filterText + "/"); 

        final QueryBatcher batcher = DMManager.newQueryBatcher(queryBySubdirectory);

        batcher.withBatchSize(batchSize);
        batcher.withThreadCount(threadCount);
        batcher.withConsistentSnapshot();
        batcher.onUrisReady(transformListener).onQueryFailure(exception -> {
            exception.printStackTrace();
            System.out.println("There was an error on Transform process.");
        });

        final JobTicket ticket = DMManager.startJob(batcher);
        batcher.awaitCompletion();
        DMManager.stopJob(ticket);
    } catch (Exception fault) {
        transformResult.compute(transformName, (k, v) -> TransformExecutionResults.GeneralException); // general exception
    }

    return transformResult.get(transformName);
}

Solution

  • If the job is small enough, you can just implement the document rewriting within your enode code either by making a call to a resource service extension:

    http://docs.marklogic.com/guide/java/resourceservices#id_27702

    http://docs.marklogic.com/javadoc/client/com/marklogic/client/extensions/ResourceServices.html

    or by invoking a main module:

    http://docs.marklogic.com/guide/java/resourceservices#id_84134

    If the job is too long to fit in a single transaction, your can create a QueryBatcher with a document URI iterator instead of with a query. See:

    http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/DataMovementManager.html#newQueryBatcher-java.util.Iterator-

    For some examples illustrating the approach, see the second half of the second example in the class description for QueryBatcher:

    http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/QueryBatcher.html

    as well as the second half of this example:

    http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/UrisToWriterListener.html

    In your case, you could implement an Iterator that calls a resource service extension or invokes a main module to get and return the URIs (preferrably with read ahead), blocking when necessary.

    By returning the uris to the client, it's easy to log the uris for later audit.

    Hoping that helps,