Search code examples
javastreamdigest

Java: how to calculate sha1 digest on-the-fly on a stream that is being saved?


I have a servlet written in Java that accepts a multpart-form posted file that needs to be saved in MongoDb/GridFS. I already have the code working for this.

Here is a code fragment that shows how it is done using the org.apache.commons.fileupload package. It consumes almost no memory, because it does not keep too much data in memory.

        ServletFileUpload upload = new ServletFileUpload();
        FileItemIterator iter = upload.getItemIterator(req);
        while (iter.hasNext()) {
            FileItemStream item = iter.next();
            String name = item.getFieldName();
            InputStream stream = item.openStream();
            if (item.isFormField()) {
                toProcess.put(name, Streams.asString(stream));
            } else {
                String fileName = item.getName();
                String contentType = item.getHeaders().getHeader("Content-Type");
                GridFSUploadOptions options = new GridFSUploadOptions()
                        // .chunkSizeBytes(358400)
                        .metadata(new Document("content_type", contentType));
                ObjectId fileId = gridFSFilesBucket.uploadFromStream(fileName, stream, options);
                fileIds.add(fileId);
                fileNames.add(fileName);
            }

I also need to calculate sha1 hash values for all files. Apache digestutils could be used for this. It has a method that can calculate sha1 on a stream:

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/DigestUtils.html#sha1-java.io.InputStream-

My problem is that this method consumes the stream entirely. I need to split the input stream into two parts. Feed one part into SHA-1 calculation and the other part into the GridFS bucket.

How can I do that? I was thinking about creating my own "pipe" that has an input and an output stream, forwards all data but updates the digest on the fly.

I just don't know how to start writting such a pipe.


Solution

  • You can use the Java API class DigestInputStream

    As the Javadoc explains,

    A transparent stream that updates the associated message digest using the bits going through the stream.

    To complete the message digest computation, call one of the digest methods on the associated message digest after your calls to one of this digest input stream's read methods.

    In your code you can do this:

    InputStream stream = item.openStream();
    MessageDigest digest = MessageDigest.getInstance("SHA-256");
    stream = new DigestInputStream(stream, digest);
    

    And at the end you can get the digest with:

    byte[] hash = digest.digest();