Search code examples
marklogicjava

Is there compress option for exportListener in Marklogic Java Client API?


I want export all the documents from my marklogic db using Data Movement SDK . I exported successfully as files but i want to compress them in zip file through DMSDK. I searched in the documentation regarding compress option but didn't find any.

Updated Code

public class Extract {
    static // replace with your MarkLogic Server connection information

    DatabaseClient client =
              DatabaseClientFactory.newClient("x", x,
                                              "x", "x",
                                              Authentication.DIGEST);

    private static String EX_DIR = "F:/JavaExtract";

    // Loading files into the database asynchronously
    public static void exportByQuery() {  
         DataMovementManager dmm = client.newDataMovementManager();
        // Construct a directory query with which to drive the job.
        QueryManager qm = client.newQueryManager();
        StringQueryDefinition query = qm.newStringDefinition();
        query.setCollections("GOT");


        // Create and configure the batcher
        QueryBatcher batcher = dmm.newQueryBatcher(query);
        batcher.withBatchSize(1000)
        .withThreadCount(10)
        .onUrisReady(
            new ExportListener()
                .onDocumentReady(doc-> {
                    String uriParts[] = doc.getUri().split("/");
                    try {
                       FileOutputStream dest = new 
                             FileOutputStream("F:/Json/file.zip");
                           ZipOutputStream out = new ZipOutputStream(new 
                             BufferedOutputStream(dest));
                           ZipEntry e = new ZipEntry(uriParts[uriParts.length - 1]);
                           out.putNextEntry(e);

                           byte[] data = doc.getContent(
                                   new StringHandle()).toBuffer();
                           doc.getFormat();
                           out.write(data, 0, data.length);
                          out.closeEntry();

                          out.close();

                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }))
               .onQueryFailure( exception -> exception.printStackTrace() );

        dmm.startJob(batcher);

        // Wait for the job to complete, and then stop it.
        batcher.awaitCompletion();
        dmm.stopJob(batcher);
    }

    public static void main(String[] args) {
        exportByQuery();
    }
}

When i am running it is taking only the last document in GOT collection and keeping in zip rather than taking all.

Any Help Is Appreciated

Thanks


Solution

  • You're really close. Just use standard Java zip writing rather than Files.write. The top two answers here look really good: How to create a zip file in Java

    Another option is WriteToZipConsumer. That would replace all your code in the onDocumentReady call.

    [UPDATE based on updated question] Your onDocumentReady listener is run for each document, so I'm guessing it doesn't make sense to create a new FileOutputStream("F:/Json/file.zip"); for each document. That's why you're only seeing the last document when you're done. Try moving these two lines to before you initialize your batcher:

                           final FileOutputStream dest = new 
                             FileOutputStream("F:/Json/file.zip");
                           final ZipOutputStream out = new ZipOutputStream(new 
                             BufferedOutputStream(dest));
    

    That way they'll only run once.

    Also, move this until after dmm.stopJob(batcher);:

                          out.close();
    

    Also, surround your listener code in a synchronized(out) {...} block so the threads won't overwrite each other as they write to the stream. Remember, your listener code is going to run in 10 threads in parallel, so your code in the listener needs to be thread-safe.