To be more specific, I have the following innocuous-looking little Repa 3 program:
{-# LANGUAGE QuasiQuotes #-}
import Prelude hiding (map, zipWith)
import System.Environment (getArgs)
import Data.Word (Word8)
import Data.Array.Repa
import Data.Array.Repa.IO.DevIL
import Data.Array.Repa.Stencil
import Data.Array.Repa.Stencil.Dim2
main = do
[s] <- getArgs
img <- runIL $ readImage s
let out = output x where RGB x = img
runIL . writeImage "out.bmp" . Grey =<< computeP out
output img = map cast . blur . blur $ blur grey
where
grey = traverse img to2D luminance
cast n = floor n :: Word8
to2D (Z:.i:.j:._) = Z:.i:.j
---------------------------------------------------------------
luminance f (Z:.i:.j) = 0.21*r + 0.71*g + 0.07*b :: Float
where
(r,g,b) = rgb (fromIntegral . f) i j
blur = map (/ 9) . convolve kernel
where
kernel = [stencil2| 1 1 1
1 1 1
1 1 1 |]
convolve = mapStencil2 BoundClamp
rgb f i j = (r,g,b)
where
r = f $ Z:.i:.j:.0
g = f $ Z:.i:.j:.1
b = f $ Z:.i:.j:.2
Which takes this much time to process a 640x420 image on my 2Ghz core 2 duo laptop:
real 2m32.572s
user 4m57.324s
sys 0m1.870s
I know something must be quite wrong, because I have gotten much better performance on much more complex algorithms using Repa 2. Under that API, the big improvement I found came from adding a call to 'force' before every array transform (which I understand to mean every call to map, convolve, traverse etc). I cannot quite make out the analogous thing to do in Repa 3 - in fact I thought the new manifestation type parameters are supposed to ensure there is no ambiguity about when an array needs to be forced? And how does the new monadic interface fit into this scheme? I have read the nice tutorial by Don S, but there are some key gaps between the Repa 2 and 3 APIs that are little discussed online AFAIK.
More simply, is there a minimally impactful way to fix the above program's efficiency?
The new representation type parameters don't automagically force when needed (it's probably a hard problem to do that well) - you still have to force manually. In Repa 3 this is done with the computeP function:
computeP
:: (Monad m, Repr r2 e, Fill r1 r2 sh e)
=> Array r1 sh e -> m (Array r2 sh e)
I personally really don't understand why it's monadic, because you can just as well use Monad Identity:
import Control.Monad.Identity (runIdentity)
force
:: (Repr r2 e, Fill r1 r2 sh e)
=> Array r1 sh e -> Array r2 sh e
force = runIdentity . computeP
So, now your output
function can be rewritten with appropriate forcing:
output img = map cast . f . blur . f . blur . f . blur . f $ grey
where ...
with an abbreviation f
using a helper function u
to aid type inference:
u :: Array U sh e -> Array U sh e
u = id
f = u . force
With these changes, the speedup is quite dramatic - which is to be expected, as without intermediate forcing each output pixel ends up evaluating much more than is necessary (the intermediate values aren't shared).
Your original code:
real 0m25.339s
user 1m35.354s
sys 0m1.760s
With forcing:
real 0m0.130s
user 0m0.320s
sys 0m0.028s
Tested with a 600x400 png, the output files were identical.