Search code examples
arrayshaskellrepa

How to take an array slice with Repa over a range


I am attempting to implement a cumulative sum function using Repa in order to calculate integral images. My current implementation looks like the following:

cumsum :: (Elt a, Num a) => Array DIM2 a -> Array DIM2 a
cumsum array = traverse array id cumsum' 
    where
        elementSlice inner outer = slice array (inner :. (0 :: Int))
        cumsum' f (inner :. outer) = Repa.sumAll $ elementSlice inner outer

The problem is in the elementSlice function. In matlab or say numpy this could be specified as array[inner,0:outer]. So what I am looking for is something along the lines of:

slice array (inner :. (Range 0 outer))

However, it seems that slices are not allowed to be specified over ranges currently in Repa. I considered using partitions as discussed in Efficient Parallel Stencil Convolution in Haskell, but this seems like a rather heavyweight approach if they would be changed every iteration. I also considered masking the slice (multiplying by a binary vector) - but this again seemed like it would perform very poorly on large matrices since I would be allocating a mask vector for every point in the matrix...

My question - does anyone know if there are plans to add support for slicing over a range to Repa? Or is there a performant way I can already attack this problem, maybe with a different approach?


Solution

  • Actually, I think the main problem is that Repa does not have a scan primitive. (However, a very similar library Accelerate does.) There are two variants of scan, prefix scan and suffix scan. Given a 1D array

    [a_1, ..., a_n]

    prefix scan returns

    [0, a_0, a_0 + a_1, ..., a_0 + ... + a_{n-1} ]

    while a suffix scan produces

    [a_0, a_0 + a_1, ..., a_0 + a_1 + ... + a_n ]

    I assume this is what you are going for with your cumulative sum (cumsum) function.

    Prefix and suffix scans generalise quite naturally to multi-dimensional arrays and have an efficient implementation based on tree reduction. A relatively old paper on the topic is "Scan Primitives for Vector Computers". Also, Conal Elliott has recently written several blog posts on deriving efficient parallel scans in Haskell.

    The integral image (on a 2D array) can be calculated by doing two scans, one horizontally and one vertically. In the absence of a scan primitive I have implemented one, very inefficiently.

    horizScan :: Array DIM2 Int -> Array DIM2 Int
    horizScan arr = foldl addIt arr [0 .. n - 1]
      where 
        addIt :: Array DIM2 Int -> Int -> Array DIM2 Int
        addIt accum i = accum +^ vs
           where 
             vs = toAdd (i+1) n (slice arr (Z:.All:.i))
        (Z:.m:.n) = arrayExtent arr
    
    --
    -- Given an @i@ and a length @len@ and a 1D array @arr@ 
    -- (of length n) produces a 2D array of dimensions n X len.
    -- The columns are filled with zeroes up to row @i@.
    -- Subsequently they are filled with columns equal to the 
    -- values in @arr.
    --
    toAdd :: Int -> Int -> Array DIM1 Int -> Array DIM2 Int
    toAdd i len arr = traverse arr (\sh -> sh:.(len::Int)) 
                   (\_ (Z:.n:.m) -> if m >= i then arr ! (Z:.n) else 0) 
    

    The function for calculating the integral image can then be defined as

    vertScan :: Array DIM2 Int -> Array DIM2 Int
    vertScan = transpose . horizScan . transpose
    
    integralImage = horizScan . vertScan
    

    Given that scan has been implemented for Accelerate, it shouldn't be too hard to add it to Repa. I am not sure if an efficient implementation using the existing Repa primitives is possible or not.