I wanted to optimize recursion with openMP. So, I started from this question: best-way-to-parallelize-this-recursion-using-openmp
While looking to optimize recursive functions I was first interested in openMP. Starting from the code below that I found there (https://en.cppreference.com/w/cpp/chrono) :
#include <iostream>
#include <chrono>
long fibonacci(unsigned n)
{
if (n < 2) return n;
return fibonacci(n-1) + fibonacci(n-2);
}
int main()
{
auto start = std::chrono::steady_clock::now();
std::cout << "f(42) = " << fibonacci(42) << '\n';
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
Compiled with g++ without any optimization it gives the result:
f(42) = 267914296
elapsed time: 1.88232s
Then I used openMP similarly to the answer upside... but I had to stop the process because it took a very very long time. I do not know a lot about openMP recursion parallelism since I just use it to optimize for loops in my code... Also, I found something in gcc documentation :
__attributes__((const))
Then I added it to my openMP "optimized" version but all I obtained is a Core Dump !!!
So I removed my omp pragmas to check that the core dump is due to my code or something else...
Then the code became:
#include <iostream>
#include <chrono>
__attribute__ ((const))
long fibonacci(unsigned n)
{
int long r[2];
if (n < 2) return n;
r[0]=fibonacci(n-1);
r[1]=fibonacci(n-2);
return (r[0]+r[1]);
}
int main()
{
auto start = std::chrono::steady_clock::now();
std::cout << "f(42) = " << fibonacci(42) << '\n';
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
I compiled it with the following command:
g++ fibonacci.cpp -o fibonacci -Wall -O2 -march=native
Now it take a lot less time to perform the fibonacci computation of 42:
f(42) = 267914296
elapsed time: 0.00106504s
in power save mode and in performance mode it gives
f(42) = 267914296
elapsed time: 0.000187806s
Here a lscpu of my computer (if somebody want to compare results):
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
CPU family: 6
Model: 158
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 9
CPU(s) scaling MHz: 95%
CPU max MHz: 4500,0000
CPU min MHz: 800,0000
BogoMIPS: 8403,00
Now, I do not know how much time it could take with openMP because I was not able to obtain a satisfactory result... I hope somebody will find a way to add omp pragmas to this kind of recursive cases.
As required, here is the omp version 1 of this case:
#include <iostream>
#include <chrono>
#include <omp.h>
__attribute__ ((const))
long fibonacci(unsigned n)
{
int long a, b;
if (n < 2) return n;
#pragma omp parallel
#pragma omp single nowait
{
#pragma omp task shared(a)
a=fibonacci(n-1);
#pragma omp task shared(b)
b=fibonacci(n-2);
#pragma omp taskwait
}
return (a+b);
}
int main()
{
auto start = std::chrono::steady_clock::now();
std::cout << "f(42) = " << fibonacci(42) << '\n';
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
And the version that drive me to a "silly code":
#include <iostream>
#include <chrono>
#include <omp.h>
__attribute__ ((const))
long fibonacci(unsigned n)
{
int long r[2];
int i;
if (n < 2) return n;
#pragma omp parallel
#pragma omp single
for(i=0;i<2;++i)
{
#pragma omp task shared(r)
r[i]=fibonacci(n-i-1);
}
return (r[0]+r[1]);
}
int main()
{
auto start = std::chrono::steady_clock::now();
std::cout << "f(42) = " << fibonacci(42) << '\n';
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
The build command for these omp versions is :
g++ omp_fibonacci_for.cpp -o omp_fibonacci_for -Wall -fopenmp
Both of them seems to loop "for ever" (fortunately Ctrl-C worked fine). But, while writing the end of this question, the second version with "for", process successfully:
f(42) = 267914296
elapsed time: 480.953s
To be clear I'm not looking for "best perf", my objective is to try to obtain a omp version that process in a reasonable time with the goal of finding the way to use openMP on recursive code. Honestly I was expecting a better execution time than 480s ! I was also surprised by the time it took in the "poor debug" version that took less 0.001s to process...
The thing is now I'd like to know if the way I use openMP is acceptable if I apply this model with "havier" tasks or if I do something wrong.
So, after taken into consideration all comments and answers, this new code seems to be a better way to apply openMP pragmas on a recursive code:
#include <iostream>
#include <chrono>
#include <omp.h>
__attribute__ ((const))
long fibonacci(unsigned n)
{
int long a, b;
if (n < 2) return n;
#pragma omp task shared(a)
a=fibonacci(n-1);
#pragma omp task shared(b)
b=fibonacci(n-2);
#pragma omp taskwait
return (a+b);
}
int main()
{
auto start = std::chrono::steady_clock::now();
long res;
#pragma omp parallel
{
#pragma omp single
res = fibonacci(42);
}
std::cout << "f(42) = " << res << '\n';
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
Build command:
g++ omp_fibonacci.cpp -o omp_fibonacci -Wall -fopenmp
Output:
f(42) = 267914296
elapsed time: 255.225s
@DanielLangr comments where particularly useful.
This is the best I obtained with the use of openMP.
And, with @JerryCoffin answer and @ThomasMatthiew comments and the advice of @VictorEijkhout... I though about parallelization differently and it drive me to this code:
#include <iostream>
#include <chrono>
#include <omp.h>
__attribute__ ((const))
long fibonacci(unsigned n)
{
int long a, b;
if (n < 2) return n;
a=fibonacci(n-2);
b=fibonacci(n-1);
return (a+b);
}
long omp_fibonacci(unsigned n)
{
int long a, b;
if (n < 2) return n;
#pragma omp task shared(a)
a=fibonacci(n-1);
#pragma omp task shared(b)
b=fibonacci(n-2);
#pragma omp taskwait
return (a+b);
}
int main()
{
auto start = std::chrono::steady_clock::now();
long res;
#pragma omp parallel
{
#pragma omp single
res = omp_fibonacci(42);
}
std::cout << "f(42) = " << res << '\n';
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
Outputs:
f(42) = 267914296
elapsed time: 1.00274s