Search code examples
scalascalaz-stream

Using scalaz-stream to calculate a digest


So I was wondering how I might use scalaz-stream to generate the digest of a file using java.security.MessageDigest?

I would like to do this using a constant memory buffer size (for example 4KB). I think I understand how to start with reading the file, but I am struggling to understand how to:

1) call digest.update(buf) for each 4KB which effectively is a side-effect on the Java MessageDigest instance, which I guess should happen inside the scalaz-stream framework.

2) finally call digest.digest() to receive back the calculated digest from within the scalaz-stream framework some how?

I think I understand kinda how to start:

import scalaz.stream._
import java.security.MessageDigest

val f = "/a/b/myfile.bin"
val bufSize = 4096

val digest = MessageDigest.getInstance("SHA-256")

Process.constant(bufSize).toSource
  .through(io.fileChunkR(f, bufSize))

But then I am stuck! Any hints please? I guess it must also be possible to wrap the creation, update, retrieval (of actual digest calculatuon) and destruction of digest object in a scalaz-stream Sink or something, and then call .to() passing in that Sink? Sorry if I am using the wrong terminology, I am completely new to using scalaz-stream. I have been through a few of the examples but am still struggling.


Solution

  • Since version 0.4 scalaz-stream contains processes to calculate digests. They are available in the hash module and use java.security.MessageDigest under the hood. Here is a minimal example how you could use them:

    import scalaz.concurrent.Task
    import scalaz.stream._
    
    object Sha1Sum extends App {
      val fileName = "testdata/celsius.txt"
      val bufferSize = 4096
    
      val sha1sum: Task[Option[String]] =
        Process.constant(bufferSize)
          .toSource
          .through(io.fileChunkR(fileName, bufferSize))
          .pipe(hash.sha1)
          .map(sum => s"${sum.toHex}  $fileName")
          .runLast
    
      sha1sum.run.foreach(println)
    }
    

    The update() and digest() calls are all contained inside the hash.sha1 Process1.