I'm trying to scale all the columns in a matrix with a corresponding value from a vector. Where this value is 0, I want to replace that column with a column from an other matrix scaled by a constant. Sounds complicated, but in Matlab it's pretty simple (but probably not fully optimized):
a(:,b ~= 0) = a(:,b ~= 0)./b(b ~= 0);
a(:,b == 0) = c(:,b == 0)*x;
doing it with a for loop
in C++ would also be pretty simple:
RowVectorXf b;
Matrix3Xf a, c;
float x;
for (int i = 0; i < b.size(); i++) {
if (b(i) != 0) {
a.col(i) = a.col(i) / b(i);
} else {
a.col(i) = c.col(i) * x;
}
}
Is there a possibility to do this operation (faster) with Eigen intrinsics such as colwise
and select
?
p.s. I tried to shorten the if condition to the form
a.col(i) = (b(i) != 0) ? (a.col(i) / b(i)) : (c.col(i) * x);
But this does not compile with the error error: operands to ?: have different types ...(long listing of the types)
Edit: I added the code for testing the answers, here it is:
#include <Eigen/Dense>
#include <stdlib.h>
#include <chrono>
#include <iostream>
using namespace std;
using namespace Eigen;
void flushCache()
{
const int size = 20 * 1024 * 1024; // Allocate 20M. Set much larger than L2
volatile char *c = (char *) malloc(size);
volatile int i = 8;
for (volatile int j = 0; j < size; j++)
c[j] = i * j;
free((void*) c);
}
int main()
{
Matrix3Xf a(3, 1000000);
RowVectorXf b(1000000);
Matrix3Xf c(3, 1000000);
float x = 0.4;
a.setRandom();
b.setRandom();
c.setRandom();
for (int testNumber = 0; testNumber < 4; testNumber++) {
flushCache();
chrono::high_resolution_clock::time_point t1 = chrono::high_resolution_clock::now();
for (int repetition = 0; repetition < 1000; repetition++) {
switch (testNumber) {
case 0:
for (int i = 0; i < b.size(); i++) {
if (b(i) != 0) {
a.col(i) = a.col(i) / b(i);
} else {
a.col(i) = c.col(i) * x;
}
}
break;
case 1:
for (int i = 0; i < b.size(); i++) {
a.col(i) = (b(i) != 0) ? (a.col(i) / b(i)).eval() : (c.col(i) * x).eval();
}
break;
case 2:
for (int i = 0; i < b.size(); i++) {
a.col(i) = (b(i) != 0) ? (a.col(i) * (1.0f / b(i))) : (c.col(i) * x);
}
break;
case 3:
a = b.cwiseEqual(0.0f).replicate< 3, 1 >().select(c * x, a.cwiseQuotient(b.replicate< 3, 1 >()));
break;
default:
break;
}
}
chrono::high_resolution_clock::time_point t2 = chrono::high_resolution_clock::now();
auto duration = chrono::duration_cast< chrono::milliseconds >(t2 - t1).count();
cout << "duration: " << duration << "ms" << endl;
}
return 0;
}
Sample output is:
duration: 14391ms
duration: 15219ms
duration: 9148ms
duration: 13513ms
By the way, not using setRandom to init the variables, the output is totally different:
duration: 10255ms
duration: 11076ms
duration: 8250ms
duration: 5198ms
@chtz suggests it's because of denormalized values, but I think it's because of branch prediction. An evidance that it's because of branch prediction is, that initializing b.setZero();
leads to the same timings as not initializing.
a.col(i) = (b(i) != 0) ? (a.col(i) * (1.0f/b(i))) : (c.col(i) * x);
would work but only because the expressions would be of the same type, and it will likely not safe any time (a ? :
expression is essentially translated to the same as an if
-else
branch.)
If you prefer writing it into one line, the following expression should work:
a = b.cwiseEqual(0.0f).replicate<3,1>().select(c*x, a.cwiseQuotient(b.replicate<3,1>()));
Again, I doubt it will make any significant performance difference.