I've converted a MATLAB code to C++ to speed it up, using the Armadillo library to handle matrix operations in C++, but surprisingly it is 10 times slower than the MATLAB code!
So I test the Armadillo library to see if it's the cause. The below code is a simple test code that initializes two matrices, adds them together and saves the result to a new matrix. One section of code uses the Armadillo library and other one doesn't. The section using Armadillo is too slow (notice the elapsed times).
Does it really slow down the execution (though it is supposed to speed it up) or am I missing some thing?
#include<iostream>
#include<math.h>
#include<chrono>
#include<armadillo>
using namespace std;
using namespace arma;
int main()
{
auto start = std::chrono::high_resolution_clock::now();
double a[100][100];
double b[100][100];
double c[100][100];
for (int i = 0; i < 100; i++)
{
for (int j = 0; j < 100; j++)
{
a[i][j] = 1;
b[i][j] = 1;
c[i][j] = a[i][j] + b[i][j];
}
}
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time: " << elapsed.count() << " s\n";
auto start1 = std::chrono::high_resolution_clock::now();
mat a1=ones(100,100);
mat b1=ones(100,100);
mat c1(100,100);
c1 = a1 + b1;
auto finish1 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed1 = finish1 - start1;
std::cout << "Elapsed time: " << elapsed1.count() << " s\n";
return 0;
}
Here is the answer I get:
Elapsed time: 5.1729e-05 s
Elapsed time: 0.00025536 s
As you see, Armadillo is significantly slower! Is it better not to use the Armadillo library?
First of all make sure that the blas
and lapack
library are enabled, there are instructions at Armadillo doc.
The second thing is that it might be a more extensive memory allocation in Armadillo. If you restructure your code to do the memory initialisation first as
#include<iostream>
#include<math.h>
#include<chrono>
#include<armadillo>
using namespace std;
using namespace arma;
int main()
{
double a[100][100];
double b[100][100];
double c[100][100];
mat a1=ones(100,100);
mat b1=ones(100,100);
mat c1(100,100);
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 100; i++)
{
for (int j = 0; j < 100; j++)
{
a[i][j] = 1;
b[i][j] = 1;
c[i][j] = a[i][j] + b[i][j];
}
}
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time: " << elapsed.count() << " s\n";
auto start1 = std::chrono::high_resolution_clock::now();
c1 = a1 + b1;
auto finish1 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed1 = finish1 - start1;
std::cout << "Elapsed time: " << elapsed1.count() << " s\n";
return 0;
}
With this I got the result:
Elapsed time: 0.000647521 s
Elapsed time: 0.000353198 s
I compiled it with (in Ubuntu 17.10):
g++ prog.cpp -larmadillo