Eigen: Optimizing Out of Bounds Matrix Blocks/ Slices Around Entry

As part of a personal project, I would like to optimize the following function so it runs as quickly as possible. Any performance gain matters since the performance of the rest of the program depends on it.

At the moment, I am using Eigen's block() function, but since negative and out of bounds indices are not valid arguments, I am using some additional code to only block() the necessary data.

This is the function I am trying to optimize; it attempts to extract a submatrix around a given matrix entry. The goal is to then hash the resulting matrix (using a variation of boost's hash_combine) and look it up in a hash table.

Function to Optimize

#include <iostream>
#include <Eigen/Dense>

using namespace Eigen;

/// Returns a subsection from a matrix given the center and radius of the section. Fills space outside base matrix bounds with 0.
/// \param baseMatrix: The matrix from which to take a section. Entires are integers, either 0 or 1.
/// \param centerRow: The row serving as the center of the subsection.
/// \param centerCol: The column serving as the center of the subsection.
/// \param radius: The radius of the subsection, including the center entry.
/// \return A matrix of the same type as the input matrix of size radius*2-1, representing the slice of the input matrix around the provided center coordinate.
Eigen::MatrixXi GetMatrixSection(const MatrixXi &baseMatrix, int centerRow, int centerCol, int radius)
{
    //== Constraints ==
    // baseMatrix, both dimensions can be between 1 and maxInt, but will generally be in range [3, 25] matrix is usually but not always square
    // 0 <= centerRow <= baseMatrix.rows()-1
    // 0 <= centerCol <= baseMatrix.cols()-1
    // 1 <= radius <= max(baseMatrix.rows(), baseMatrix.cols())
    //    This specific implementation of the function allows for radius of any size > 0

    // Create Base Matrix to fill
    int nSize = radius * 2 - 1; // Size of resulting matrix
    MatrixXi result = Eigen::MatrixXi().Constant(nSize, nSize, 0);

    // Get indices of the top-left corner for the block operation
    int lowerRowBound = centerRow - (radius-1);
    int lowerColBound = centerCol - (radius-1);

    // Get the top-left corner of baseMatrix
    int upperLeftCopyableRow = std::max(0, lowerRowBound);
    int upperLeftCopyableCol = std::max(0, lowerColBound);

    // Determine how many rows we need to take from the baseMatrix
    int numCopyableRows = std::min((int)baseMatrix.rows()-upperLeftCopyableRow, std::min(0, lowerRowBound)+nSize);
    int numCopyableCols = std::min((int)baseMatrix.cols()-upperLeftCopyableCol, std::min(0, lowerColBound)+nSize);

    if(numCopyableRows <= 0 || numCopyableCols <= 0) return result; // if it is impossible to copy anything from result, don't try

    // Copy all data we can from the baseMatrix
    MatrixXi copiedBlock = baseMatrix.block(upperLeftCopyableRow, upperLeftCopyableCol, numCopyableRows, numCopyableCols);

    // Copy the data from baseMatrix into resulting matrix
    result.block(upperLeftCopyableRow-lowerRowBound, upperLeftCopyableCol-lowerColBound,
                 (int)copiedBlock.rows(), (int)copiedBlock.cols()) = copiedBlock;

    // Return resulting matrix
    return result;
}

I'm using the following code to test the efficiency of the above function.

Timing Code for Function

int TestGetMatrixSection(MatrixXi matrixToTest, int trials=1)
{
    volatile int result = 0;
    for(int t = 0; t < trials; ++t) {
        for (int i = 1; i <= std::max(matrixToTest.rows(), matrixToTest.cols()); ++i) {
            for (int j = 0; j < matrixToTest.rows(); ++j) {
                for (int k = 0; k < matrixToTest.cols(); ++k) {
//                    std::cout << GetMatrixSection(matrixToTest, j, k, i) << "/n/n"; // printout
                    result += GetMatrixSection(matrixToTest, j, k, i).cols();
                }
            }
        }
    }
    return result;
}

int main()
{
    MatrixXi m = Eigen::MatrixXi(4, 5);
    m<< 1, 0, 0, 0, 1,
        1, 1, 0, 0, 1,
        1, 0, 1, 1, 0,
        1, 0, 0, 1, 0;
    auto startTime = std::chrono::steady_clock::now();
    TestGetMatrixSection(m, 10000);
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now()-startTime).count() << " milliseconds\n";
    return 0;
}

Currently the function works, but I am worried about performance being an issue where it will be called millions of times.

Expected Output

Example 3x3 matrix
[[1, 0, 1],      
 [1, 1, 0],
 [0, 1, 0]]


row index 1, col index 1, radius 2

[[1, 0, 1],      
 [1, 1, 0],
 [0, 1, 0]]


row index 0, col index 0, radius 2

[[0, 0, 0],      
 [0, 1, 0],
 [0, 1, 1]]


row index 2, col index 2, radius 2

[[1, 0, 0],      
 [1, 0, 0],
 [0, 0, 0]]


row index 0, col index 0, radius 3

[[0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 1, 0, 1],
 [0, 0, 1, 1, 0],
 [0, 0, 0, 1, 0]]

Solution

Assuming the size of baseMatrix does not change, and you know a reasonable upper for radius I suggest to store that matrix into a larger matrix which is extended by 0-entries on every side. One possibility to implement this:

template<class S>
struct ExtendedMatrix{
    typedef Eigen::Matrix<S,Eigen::Dynamic, Eigen::Dynamic> MatS;
    typedef Eigen::Ref<MatS> Ref;
    typedef Eigen::Ref<MatS const> RefC;

    // read/write reference to baseMatrix:
    Ref getBaseMatrix() {
        return storage.block(maxR-1, maxR-1, storage.rows()-2*maxR+1, storage.cols()-2*maxR+1);
    }

    // read-only reference to baseMatrix:
    RefC getBaseMatrix() const {
        return storage.block(maxR-1, maxR-1, storage.rows()-2*maxR+1, storage.cols()-2*maxR+1);
    }

    // constructor. Takes an initial matrix and a maximum radius.
    template<class Derived>
    ExtendedMatrix(Eigen::MatrixBase<Derived> const& base, int maxRadius)
        : maxR(maxRadius), storage(base.rows()+2*maxR-1, base.cols()+2*maxR-1)
    {
        storage.setZero();
        getBaseMatrix() = base;
    }

    // method to get a (read-only) block of size 2radius-1 around a given coordinate
    // entries outside the original matrix are filled with 0s.
    RefC getMatrixSection(int centerRow, int centerCol, int radius) const
    {
        assert(radius <= maxR);
        int offset = maxR-radius;
        return storage.block(centerRow+offset, centerCol+offset, 2*radius-1, 2*radius-1);
    }

protected:
    int maxR;
    MatS storage;
};

Use it like this:

int TestGetMatrixSection(Eigen::MatrixXi const & matrixToTest, int trials=1)
{
    int maxR = std::max(matrixToTest.rows(), matrixToTest.cols());
    ExtendedMatrix<int> eMat(matrixToTest, maxR);
    int result = 0;
    for(int t = 0; t < trials; ++t) {
        // eMat.getBaseMatrix().setRandom(); // or change individual entries ...
        for (int i = 1; i <= std::max(matrixToTest.rows(), matrixToTest.cols()); ++i) {
            for (int j = 0; j < matrixToTest.rows(); ++j) {
                for (int k = 0; k < matrixToTest.cols(); ++k) {
//                    std::cout << GetMatrixSection(matrixToTest, j, k, i) << "/n/n"; // printout
                    result += eMat.getMatrixSection(j, k, i).cols();
                }
            }
        }
    }
    return result;
}