c++eigen matrix-multiplication memory-alignment eigen3

Eigen: Should i use aligned map for intensive computations?

I want to perform a lot of computations over externally allocated data, especially matrix multiplications. It can be done via Eigen::Map. Unfortunately I'm not an expert in vectorized computations, but as far as i can see it's possible to specify Aligned flag for Map.

I decided to check performance difference between matrix multiplications via Eigen::MatrixXf and 'Eigen::Map':

void testMatProduct(
        const Eigen::MatrixXf &a,
        const Eigen::MatrixXf &b,
        Eigen::MatrixXf &res)
{
    const auto startTime = std::chrono::high_resolution_clock::now();
    res.noalias() = a * b;
    const auto endTime = std::chrono::high_resolution_clock::now();
    const auto duration = std::chrono::duration_cast<std::chrono::microseconds>( endTime - startTime ).count();
    std::cout << "Mat product elapsed " << duration / 1.0e6 << std::endl;
}

using EigenMap = Eigen::Map<Eigen::MatrixXf, Eigen::Unaligned>;

void testMapProduct(
        const EigenMap &a,
        const EigenMap &b,
        EigenMap &res)
{
    const auto startTime = std::chrono::high_resolution_clock::now();
    res.noalias() = a * b;
    const auto endTime = std::chrono::high_resolution_clock::now();
    const auto duration = std::chrono::duration_cast<std::chrono::microseconds>( endTime - startTime ).count();
    std::cout << "Map product elapsed " << duration / 1.0e6 << std::endl;
}

int main(int, char **)
{    
    srand(42);
    const int64_t N = 7000;
    const int64_t K = 6000;
    const int64_t M = 100;
    Eigen::MatrixXf mat1 = Eigen::MatrixXf::Random(N, K);
    Eigen::MatrixXf mat2 = Eigen::MatrixXf::Random(K, M);
    Eigen::MatrixXf matRes = Eigen::MatrixXf::Zero(N, M);

    // Copy data from mats to vecs
    Eigen::VectorXf vec1 = Eigen::Map<Eigen::MatrixXf>(mat1.data(), mat1.rows() * mat1.cols(), 1);
    Eigen::VectorXf vec2 = Eigen::Map<Eigen::MatrixXf>(mat2.data(), mat2.rows() * mat2.cols(), 1);
    Eigen::VectorXf vecRes = Eigen::VectorXf::Zero(N * M);

    EigenMap map1 = EigenMap(vec1.data(), mat1.rows(), mat1.cols());
    EigenMap map2 = EigenMap(vec2.data(), mat2.rows(), mat2.cols());
    EigenMap mapRes = EigenMap(vecRes.data(), matRes.rows(), matRes.cols());
    for(int i = 0; i < 10; ++i){
        testMapProduct(map1, map2, mapRes);
        testMatProduct(mat1, mat2, matRes);
        matRes.setZero();
        vecRes.setZero();
    }

    return 0;
}

I'm pretty sure this is not a valid benchmark but it should give me some intuition. I compile it with -march=native and it prints following output:

Map product elapsed 0.102751
Mat product elapsed 0.10224
Map product elapsed 0.10022
Mat product elapsed 0.100726
Map product elapsed 0.09963
Mat product elapsed 0.100697
Map product elapsed 0.099673
Mat product elapsed 0.100809
Map product elapsed 0.100195
.......

So it's seems to me that there are no big difference between map product and matrix product.

My questions are: 1) What the difference between Map<MatrixXf, Unaligned> and Map<MatrixXf, Aligned> in terms of performance? Should i care about Map alignment for other operations like dot products, elementwise addition, etc

2) Is my comparison correct?

PS Sorry for my poor English

Solution

1) Data alignment specifies the way how data is ought to be accessed and arranged. This means if you are using Eigen::MatrixXf, which refers to a matrix of unknown dimensions at compile-time with data type float, the data pointer should be aligned on a 4-byte (32-bits) boundary (assuming float is represented using 32-bits on your system).

What impact on performance do different specifications of data alignment have? To answer this question, we will be taking a look at the following discussion:
Talk: On a 32-bit architecture, would a 16-bit value not aligned on a 32-bit boundary be accessed more slowly?

Main argument that it affects performance: Packing two 16-bit values into a 32-bit register means that you must spend resources on converting the data from one format to the other

One may argue that languages such as C/C++ support sub-word accessing which means you don't have to convert them, implying that you can save memory space and have no negative impact on performance.

I would assume that the Eigen library automatically detects that the data pointer for Eigen::MatrixXf is aligned on a 4 bytes boundary and therefore there are no performance implications if you leave out the MapOption template or assign it to Eigen::Unaligned. If you want to be sure use Eigen::Aligned4 (recall that Eigen::Aligned is deprecated and a synonym for Aligned16, so 128-bits). You can take a look at the alignment enumerator here.

2) Eigen::Map enjoys the benefit that matrices and vectors can be initialized without copying data, unlike with Eigen::Matrix and Eigen::Vector. I'm pretty sure that Eigen::Map and Eigen::Matrix use the same operations for multiplying, addition, etc. objects underneath, just the referencing is different. The only performance benefit I can see from using Eigen::Matrix is spatial locality in terms of cache performance if Eigen::Map references to two matrices/vectors which are far apart in memory and when working with huge matrix sizes. Of course assuming that you initialized the two Eigen::Matrix objects just after another, such that they are contiguous in memory.