My understanding to par
is that it will create a thread in another core to execution.
But I failed proof this understanding with following test code since the result showing seems only one thread is running.
Could you help me to figure out what's wrong here?
import Control.Monad
import Control.Parallel
import Control.Concurrent
import System.IO.Unsafe
fib :: Int -> Int
fib 0 = 1
fib 1 = 1
fib n = (fib (n-1)) + (fib (n - 2))
test :: String -> [Int] -> IO ()
test _ [] = return ()
test name (a:xs) = do
tid <- myThreadId
print $ (show tid) ++ "==>" ++ (show a) ++ "==>" ++ (show $ fib a) ++ "==>" ++ name
x `par` y
where x = test "2" xs
y = test "3" (tail xs)
main = test "1" [10..35]
Compiled with:
ghc --make -threaded -rtsopts test-par.hs
time ./test-par +RTS -N2
The result
"ThreadId 3==>10==>89==>1"
"ThreadId 3==>12==>233==>3"
"ThreadId 3==>14==>610==>3"
"ThreadId 3==>16==>1597==>3"
"ThreadId 3==>18==>4181==>3"
"ThreadId 3==>20==>10946==>3"
"ThreadId 3==>22==>28657==>3"
"ThreadId 3==>24==>75025==>3"
"ThreadId 3==>26==>196418==>3"
"ThreadId 3==>28==>514229==>3"
"ThreadId 3==>30==>1346269==>3"
"ThreadId 3==>32==>3524578==>3"
"ThreadId 3==>34==>9227465==>3"
real 0m1.131s
user 0m0.668s
sys 0m0.492s
How many core I have?
cat /proc/cpuinfo | grep processor | wc -l
2
-------------------------------- update
I think this paper by Simon Marlow is a good reference for such newbie question.
No, par
does not guarantee to create another thread.
It registers a spark in the runtime, which may cause the computation to be executed in a different thread, depending on the dynamic workload of the machine.
Creating a spark is very cheap, so you can create many (1000x) more than you have threads, and the runtime will just try to keep all your cores busy.
In your case, your x
computation is registered as a spark, and then immediately discarded (you never refer to it again). So the garbage collector can remove it.
To parallelize a recursive function, you'll typically want to just use par
up to some depth.
An example -- a recursive function with a cutoff depth:
import Control.Parallel
import Control.Monad
import Text.Printf
cutoff = 35
fib' :: Int -> Integer
fib' 0 = 0
fib' 1 = 1
fib' n = fib' (n-1) + fib' (n-2)
fib :: Int -> Integer
fib n | n < cutoff = fib' n
| otherwise = r `par` (l `pseq` l + r)
where
l = fib (n-1)
r = fib (n-2)
main = forM_ [0..45] $ \i ->
printf "n=%d => %d\n" i (fib i)
Run as:
$ ghc -O2 -threaded --make A.hs -rtsopts -fforce-recomp
$ ./A +RTS -N16 -s
Yields a parallel workload 1248.9% over sequential (i.e. 12.48x):
Tot time (elapsed) Avg pause Max pause
Gen 0 6264 colls, 6263 par 5.23s 0.41s 0.0001s 0.0109s
Gen 1 1 colls, 1 par 0.00s 0.00s 0.0002s 0.0002s
Parallel GC work balance: 7.09 (1797194 / 253525, ideal 16)
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 7.56s ( 9.36s) 0.92s ( 0.89s)
Task 1 (worker) : 0.12s ( 10.21s) 0.02s ( 0.05s)
Task 2 (bound) : 8.39s ( 9.51s) 0.70s ( 0.74s)
Task 3 (worker) : 0.00s ( 0.00s) 0.00s ( 0.00s)
Task 4 (worker) : 7.17s ( 9.60s) 0.53s ( 0.66s)
Task 5 (worker) : 7.28s ( 9.55s) 0.56s ( 0.71s)
Task 6 (worker) : 7.48s ( 9.52s) 0.56s ( 0.74s)
Task 7 (worker) : 7.11s ( 9.54s) 0.66s ( 0.72s)
Task 8 (worker) : 7.41s ( 9.62s) 0.70s ( 0.64s)
Task 9 (worker) : 7.69s ( 9.48s) 0.66s ( 0.78s)
Task 10 (worker) : 7.56s ( 9.51s) 0.56s ( 0.75s)
Task 11 (worker) : 7.69s ( 9.42s) 0.86s ( 0.84s)
Task 12 (worker) : 7.42s ( 9.40s) 0.92s ( 0.86s)
Task 13 (worker) : 7.28s ( 9.39s) 0.91s ( 0.86s)
Task 14 (worker) : 7.44s ( 9.38s) 0.91s ( 0.87s)
Task 15 (worker) : 7.25s ( 9.33s) 1.11s ( 0.93s)
Task 16 (worker) : 7.94s ( 9.33s) 0.97s ( 0.93s)
Task 17 (worker) : 7.59s ( 9.37s) 1.06s ( 0.88s)
SPARKS: 597 (446 converted, 0 dud, 1 GC'd, 150 fizzled)
Productivity 96.1% of total user, 1245.3% of total elapsed
We created 597 sparks, of which 446 were converted into threads.
If you explicitly want to do manual thread creation and communication, this can be done via forkIO
and MVars
.