How to exploit any parallelism in my haskell parallel code?

I've just stated working in haskell semi-explicit parallelism with GHC 6.12. I've write the following haskell code to compute in parallel the map of the fibonnaci function upon 4 elements on a list, and in the same time the map of the function sumEuler upon two elements.

import Control.Parallel
import Control.Parallel.Strategies

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

mkList :: Int -> [Int]
mkList n = [1..n-1]

relprime :: Int -> Int -> Bool
relprime x y = gcd x y == 1

euler :: Int -> Int
euler n = length (filter (relprime n) (mkList n))

sumEuler :: Int -> Int
sumEuler = sum . (map euler) . mkList

-- parallel initiation of list walk                                                                                                                                    
mapFib :: [Int]
mapFib = map fib [37, 38, 39, 40]

mapEuler :: [Int]
mapEuler = map sumEuler [7600, 7600]

parMapFibEuler :: Int
parMapFibEuler = (forceList mapFib) `par` (forceList mapEuler `pseq` (sum mapFib + sum mapEuler))

-- how to evaluate in whnf form by forcing                                                                                                                                
forceList :: [a] -> ()
forceList [] = ()
forceList (x:xs) = x `pseq` (forceList xs)


main = do putStrLn (" sum : " ++ show parMapFibEuler)

to improve my program in parallel I rewrote it with par and pseq and a forcing function to force whnf evaluation. My problem is that by looking in the threadscope it appear that i didn't gain any parallelism. Things are worse because i didn't gain any speedup.

Threadscope observation

That why I have theses two questions

Question 1 How could I modify my code to exploit any parallelism ?

Question 2 How could I write my program in order to use Strategies (parMap, parList, rdeepseq and so on ...) ?

First improvement with Strategies

according to his contribution

parMapFibEuler = (mapFib, mapEuler) `using` s `seq` (sum mapFib + sum mapEuler) where
    s = parTuple2 (seqList rseq) (seqList rseq)

the parallelism appears in the threadscope but not enough to have a significant speedup

enter image description here

Solution

Your parallelism is far too course-grained to have much beneficial effect. The largest chunks of work that can be done in parallel efficiently are in sumEuler, so that's where you should add your par annotations. Try changing sumEuler to:

sumEuler :: Int -> Int
sumEuler = sum . (parMap rseq euler) . mkList

parMap is from Control.Parallel.Strategies; it expresses a map that can be done in parallel. The first argument, rseq having type Strategy a, is used to force the computation to a specific point, otherwise no work would be done, due to laziness. rseq is fine for most numeric types.

It's not useful to add parallelism to fib here, below about fib 40 there isn't enough work to make it worthwhile.

In addition to threadscope, it's useful to run your program with the -s flag. Look for a line like:

SPARKS: 15202 (15195 converted, 0 pruned)

in the output. Each spark is an entry in a work queue to possibly be performed in parallel. Converted sparks are actually done in parallel, while pruned sparks mean that the main thread got to them before a worker thread had the chance to do so. If the pruned number is high, it means your parallel expressions are too fine-grained. If the total number of sparks is low, you aren't trying to do enough in parallel.

Finally, I think parMapFibEuler is better written as:

parMapFibEuler :: Int
parMapFibEuler = sum (mapFib `using` parList rseq) + sum mapEuler

mapEuler is simply too short to have any parallelism usefully expressed here, especially as euler is already performed in parallel. I'm doubtful that it makes a substantial difference for mapFib either. If the lists mapFib and mapEuler were longer, parallelism here would be more useful. Instead of parList you may be able to use parBuffer, which tends to work well for list consumers.

Making these two changes cuts the runtime from 12s to 8s for me, with GHC 7.0.2.