I have weird observation about scalaz-streams sinks. They are working slow. Does anyone know why is that? And is there any way to improve the performance?
here are relevant parts of my code: version without sink
//p is parameter with type p: Process[Task, Pixel]
def printToImage(img: BufferedImage)(pixel: Pixel): Unit = {
img.setRGB(pixel.x, pixel.y, 1, 1, Array(pixel.rgb), 0, 0)
}
val image = getBlankImage(2000, 4000)
val result = p.runLog.run
result.foreach(printToImage(image))
this takes ~7s to execute
version with sink
//p is the same as before
def printToImage(img: BufferedImage)(pixel: Pixel): Unit = {
img.setRGB(pixel.x, pixel.y, 1, 1, Array(pixel.rgb), 0, 0)
}
//I've found that way of doing sink in some tutorial
def getImageSink(img: BufferedImage): Sink[Task, Pixel] = {
//I've tried here Task.delay and Task.now with the same results
def printToImageTask(img: BufferedImage)(pixel: Pixel): Task[Unit] = Task.delay {
printToImage(img)(pixel)
}
Process.constant(printToImageTask(img))
}
val image = getBlankImage(2000, 4000)
val result = p.to(getImageSink(image)).run.run
this one takes 33 seconds to execute. I am totally confused here because of that significant difference.
In second case you are allocating Task for each pixel, and instead of directly calling printToImage you do it through Task, and it's much more steps in a call-chain.
We use scalaz-stream a lot, but I strongly believe that it's overkill to use it for this type problems. Code running inside Process/Channel/Sink should much more complicated than simple variable assignment/update.
We use Sinks to write data from stream into databases (Cassandra) and we use batching, it's to high overhead to write individual rows. Process/Sinks is super convenient abstraction, but for more high level workflows. When it's easy to write for-loop I would suggest to write for-loop.