Search code examples
f#parallel-processingdiscrete-mathematicsperfect-numbers

F# parallelizing issue when calculating perfect numbers?


I am trying to optimize a small program which calculates perfect numbers from a given exponent.

The program runs (almost) perfectly, but when I open the task manager, it still runs on a single thread. That means that I must be doing something wrong, but my knowledge of F# is still in a 'beginning' phase.

I will try to put this question as clear as I possibly can, but if I fail in doing so, please let me know.

A perfect number is a number where the sum of all it's divisors (except for the number itself) is equal to the number itself (e.g. 6 is perfect, since the sum of it's divisors 1, 2 and 3 are 6).

I use prime numbers to speed up the calculation, that is I am not interested in (huge) lists where all the divisors are stored. To do so, I use the formula that Euclid proved to be correct: (2*(power of num - 1)) * ( 2* (power of num - 1)) where the latter is a Mersenne prime. I used a very fast algorithm from stackoverflow (by @Juliet) to determine whether a given number is a prime.

As I have been reading through several articles (I have not yet purchased a good book, so shame on me) on the Internet, I found out that sequences perform better than lists. So that is why I first started to create a function which generates a sequence of perfect numbers:

   let perfectNumbersTwo (n : int) =  
    seq { for i in 1..n do 
           if (PowShift i) - 1I |> isPrime 
           then yield PowShift (i-1) * ((PowShift i)-1I)
        } 

The helperfunction PowShift is implemented as following:

    let inline PowShift (exp:int32) = 1I <<< exp ;;

I use a bit shift operator, since the base of all power calculations is from 2, hence this could be an easy way. Of course I am still grateful for the contributions on the question I asked about this on: F# Power issues which accepts both arguments to be bigints>F# Power issues which accepts both arguments to be bigints

The function Juliet created (borrowed here) is as following:

let isPrime ( n : bigint) = 
    let maxFactor = bigint(sqrt(float n))
    let rec loop testPrime tog =
        if testPrime > maxFactor then true
        elif n % testPrime = 0I then false
        else loop (testPrime + tog) (6I - tog)
    if n = 2I || n = 3I || n = 5I then true
    elif n <= 1I || n % 2I = 0I || n % 3I = 0I || n % 5I = 0I then false
    else loop 7I 4I;;

Using this code, without parallel, it takes about 9 minutes on my laptop to find the 9th perfect number (which consists of 37 digits, and can be found with value 31 for the exponent). Since my laptop has a CPU with two cores, and only one is running at 50 percent (full load for one core) I though that I could speed up the calculations by calculating the results parallel.

So I changed my perfectnumber function as following:

//Now the function again, but async for parallel computing
let perfectNumbersAsync ( n : int) =
    async {
        try
            for x in 1.. n do
                if PowShift x - 1I |> isPrime then
                    let result = PowShift (x-1) * ((PowShift x)-1I)
                    printfn "Found %A as a perfect number" result
        with
            | ex -> printfn "Error%s" (ex.Message);
    }

To call this function, I use a small helper function to run it:

 let runPerfects n =
    [n]
        |> Seq.map perfectNumbersAsync
        |> Async.Parallel
        |> Async.RunSynchronously
        |> ignore

Where the result of async calculation is ignored, since I am displaying it within the perfectNumbersAsync function.

The code above compiles and it runs, however it still uses only one core (although it runs 10 seconds faster when calculating the 9th perfect number). I am afraid that it has to do something with the helper functions PowShift and isPrime, but I am not certain. Do I have to put the code of these helper functions within the async block of perfectNumbersAsync? It does not improve readability...

The more I play with F#, the more I learn to appreciate this language, but as with this case, I am in need of some experts sometimes :).

Thanks in advance for reading this, I only hope that I made myself a bit clear...

Robert.


Solution

  • @Jeffrey Sax's comment is definitely interesting, so I took some time to do a small experiment. The Lucas-Lehmer test is written as follows:

    let lucasLehmer p =
        let m = (PowShift p) - 1I
        let rec loop i acc =
            if i = p-2 then acc
            else loop (i+1) ((acc*acc - 2I)%m)
        (loop 0 4I) = 0I
    

    With the Lucas-Lehmer test, I can get first few perfect numbers very fast:

    let mersenne (i: int) =     
        if i = 2 || (isPrime (bigint i) && lucasLehmer i) then
            let p = PowShift i
            Some ((p/2I) * (p-1I))
        else None
    
    let runPerfects n =
        seq [1..n]
            |> Seq.choose mersenne
            |> Seq.toArray
    
    let m1 = runPerfects 2048;; // Real: 00:00:07.839, CPU: 00:00:07.878, GC gen0: 112, gen1: 2, gen2: 1
    

    The Lucas-Lehmer test helps to reduce the time checking prime numbers. Instead of testing divisibility of 2^p-1 which takes O(sqrt(2^p-1)), we use the primality test which is at most O(p^3). With n = 2048, I am able to find first 15 Mersenne numbers in 7.83 seconds. The 15th Mersenne number is the one with i = 1279 and it consists of 770 digits.

    I tried to parallelize runPerfects using PSeq module in F# Powerpack. PSeq doesn't preserve the order of the original sequence, so to be fair I have sorted the output sequence. Since the primality test is quite balance among indices, the result is quite encouraging:

    #r "FSharp.Powerpack.Parallel.Seq.dll"    
    open Microsoft.FSharp.Collections
    
    let runPerfectsPar n =
        seq [1..n]
            |> PSeq.choose mersenne
            |> PSeq.sort (* align with sequential version *)
            |> PSeq.toArray 
    
    let m2 = runPerfectsPar 2048;; // Real: 00:00:02.288, CPU: 00:00:07.987, GC gen0: 115, gen1: 1, gen2: 0
    

    With the same input, the parallel version took 2.28 seconds which is equivalent to 3.4x speedup on my quad-core machine. I believe the result could be improved further if you use Parallel.For construct and partition the input range sensibly.