Apply function to all pairs efficiently

I need a second order function pairApply that applies a binary function f to all unique pairs of a list-like structure and then combines them somehow. An example / sketch:

pairApply (+) f [a, b, c] = f a b + f a c + f b c

Some research leads me to believe that Data.Vector.Unboxed probably will have good performance (I will also need fast access to specific elements); also it necessary for Statistics.Sample, which would come in handy further down the line.

With this in mind I have the following, which almost compiles:

import qualified Data.Vector.Unboxed as U      

pairElement :: (U.Unbox a, U.Unbox b)    
            => (U.Vector a)                    
            -> (a -> a -> b)                   
            -> Int                             
            -> a                               
            -> (U.Vector b)                    
pairElement v f idx el =
  U.map (f el) $ U.drop (idx + 1) v            

pairUp :: (U.Unbox a, U.Unbox b)   
       => (a -> a -> b)                        
       -> (U.Vector a)                         
       -> (U.Vector (U.Vector b))
pairUp f v = U.imap (pairElement v f) v 

pairApply :: (U.Unbox a, U.Unbox b)
          => (b -> b -> b)                     
          -> b                                 
          -> (a -> a -> b)                     
          -> (U.Vector a)                      
          -> b
pairApply combine neutral f v =
  folder $ U.map folder (pairUp f v) where
  folder = U.foldl combine neutral

The reason this doesn't compile is that there is no Unboxed instance of a U.Vector (U.Vector a)). I have been able to create new unboxed instances in other cases using Data.Vector.Unboxed.Deriving, but I'm not sure it would be so easy in this case (transform it to a tuple pair where the first element is all the inner vectors concatenated and the second is the length of the vectors, to know how to unpack?)

My question can be stated in two parts:

Does the above implementation make sense at all or is there some quick library function magic etc that could do it much easier?
If so, is there a better way to make an unboxed vector of vectors than the one sketched above?

Note that I'm aware that foldl is probably not the best choice; once I've got the implementation sorted I plan to benchmark with a few different folds.

Solution

There is no way to define a classical instance for Unbox (U.Vector b), because that would require preallocating a memory area in which each element (i.e. each subvector!) has the same fixed amount of space. But in general, each of them may be arbitrarily big, so that's not feasible at all.

It might in principle be possible to define that instance by storing only a flattened form of the nested vector plus an extra array of indices (where each subvector starts). I once briefly gave this a try; it actually seems somewhat promising as far as immutable vectors are concerned, but a G.Vector instance also requires a mutable implementation, and that's hopeless for such an approach (because any mutation that changes the number of elements in one subvector would require shifting everything behind it).

Usually, it's just not worth it, because if the individual element vectors aren't very small the overhead of boxing them won't matter, i.e. often it makes sense to use B.Vector (U.Vector b).

For your application however, I would not do that at all – there's no need to ever wrap the upper element-choices in a single triangular array. (And it would be really bad for performance to do that, because it make the algorithm take O (n²) memory rather than O (n) which is all that's needed.)

I would just do the following:

pairApply combine neutral f v
 = U.ifoldl' (\acc i p -> U.foldl' (\acc' q -> combine acc' $ f p q)
                                   acc
                                   (U.drop (i+1) v) )
             neutral v

This corresponds pretty much to the obvious nested-loops imperative implementation

pairApply(combine, b, f, v):
    for(i in 0..length(v)-1):
        for(j in i+1..length(v)-1):
            b = combine(b, f(v[i], v[j]);
    return b;