Is Eigen library matrix/vector manipulation faster than .net ones if the matrix is dense and unsymmetrical?

I have some matrix operations, mostly dealing with operations like running over all the each of the rows and columns of the matrix and perform multiplication a*mat[i,j]*mat[ii,j]:

public double[] MaxSumFunction()
{
   var maxSum= new double[vector.GetLength(1)];
   for (int j = 0; j < matrix.GetLength(1); j++)
   {
        for (int i = 0; i < matrix.GetLength(0); i++) 
        {
             for (int ii = 0; ii < matrix.GetLength(0); ii++)
              {
                   double wi= Math.Sqrt(vector[i]);
                   double wii= Math.Sqrt(vector[ii]);
                   maxSum[j] += SomePowerFunctions(wi, wii) * matrix[i, j]*matrix[ii, j];
              }
          }                      
     }
 }

    private double SomePowerFunctions(double wi, double wj)
    {

        var betaij = wi/ wj;
        var numerator = 8 * Math.Sqrt(wi* wj) * Math.Pow(betaij, 3.0 / 2)
            * (wi+ betaij * wj);
        var dominator = Math.Pow(1 - betaij * betaij, 2) +
            4 * wi* wj* betaij * (1 + Math.Pow(betaij, 2)) +
            4 * (wi* wi+ wj* wj) * Math.Pow(betaij, 2);


        if (wi== 0 && wj== 0)
        {
            if (Math.Abs(betaij - 1) < 1.0e-8)
                return 1;
            else
                return 0;
        }

        return numerator / dominator;
    }

I found such loops to be particularly slow if the matrix size is big.

I want the speed to be fast. So I am thinking about re-implementing these algorithms using the Eigen library.

My matrix is not symmetrical, not sparse and contains no regularity that any solver can exploit reliably.

I read that Eigen solver can be fast because of:

Compiler optimization
Vectorization
Multi-thread support

But I wonder those advantages are really applicable given my matrix characteristics?

Note: I could have just run a sample or two to find out, but I believe that asking the question here and have it documented on the Internet is going to help others as well.

Solution

Before thinking about low level optimizations, look at your code and observe that many quantities are recomputed many time. For instance, f(wi,wii) does not depend on j, so they could either be precomputed once (see below) or you can rewrite your loop to make the loop on j the nested one. Then the nested loop will simply be a coefficient wise product between a constant scalar and two columns of your matrix (I don't .net and assume j is indexing columns). If the storage if column-major, then this operation should be fully vectorized by your compiler (again, I don't know .net, but any C++ compiler will do, and if you Eigen, it will be vectorized explicitly). This should be enough to get a huge performance boost.

Depending on the sizes of matrix, you might also try to leverage optimized matrix-matrix implementation by precomputed f(wi,wii) into a MatrixXd F; (using Eigen's language), and then observe that the whole computation amount to:

VectorXd v = your_vector;
MatrixXd F = MatrixXd::nullaryExpr(n,n,[&](Index i,Index j) {
                 return SomePowerFunctions(sqrt(v(i)), sqrt(v(j)));
             });
MatrixXd M = your_matrix;
MatrixXd FM = F * M;
VectorXd maxSum = (M.array() * FM.array()).colwise().sum();