I have this code that concatenates/combines a collection of images. I want to restructure this sequential code into a parallel/distributed application as my image collection is quite large ( big data :-) ). I'm contemplating Map/Reduce but not sure if this is possible under Map/Reduce.
#Sequential Code
Result.Image <- NULL
foreach(Image in Image.Collection) {
Result.Image <- CombineImage(Result.Image, Image)
}
Note: order does not matter; Combining Images 1,2,3,4,5 is as good as combining Images 2,3,1,4,5.
Ideally I would like something like this ( looks more like a classic divide-et-impera than like map/reduce ):
1,2,3,4 are the original images. One node concatenates image #1 and image #2 into a new image called image #5. A second node concatenates image #3 and image #4 into image #6 and finally a node concatenates image #5 and image #6 into the final result.
Any ideas on what framework / parallel or distributed design pattern I should use to do things like this ?
Cheers !!
From your initial description (foreach code) seems that you cannot process image #3 until you have processed #1 and #2 since you accumulate intermediate results in the Result.Image. Now, your graph shows a different story, that sibling nodes can be processed in parallel, and I am wondering if even random nodes can be combined in parallel. Regardless, I think you can put all the initial images in a FIFO queue and throw at it as many processors (threads or machines or nodes) that you can afford. Each processor picks up two images, combines them and puts the result back in the queue. You process like this until you get 1 image in the queue.