Eigen3 tiny matrices difference but large overhead

I want to make the difference betwen two little 640x512 matrices using the eigen3 library and I end up with some high computation latency (45 ms on a Intel Xeon 16 cores @ 2.4GHz). May I ask you some hints to improve this abnormal computation time ? Below is the related code snippet :

static inline void tsnorm(stTime *ts) 
{
  while (ts->tv_nsec >= NSEC_PER_SEC) 
  {
    ts->tv_nsec -= NSEC_PER_SEC;
    ts->tv_sec++;
  }
}

  const unsigned short usRawFrameRows = 640;
  const unsigned short usRawFrameCols = 512;
  using  pixType = unsigned short;
  using pixDynMat = Matrix<pixType, Dynamic, Dynamic, RowMajor>;

  pixDynMat biasFrame = pixDynMat::Zero(usRawFrameRows, usRawFrameCols);
  pixType *myRawFrame = new pixType[usRawFrameRows * usRawFrameCols];

  struct timespec tBeforeProcessFrameCall, tAfterProcessFrameCall;
  clock_gettime(CLOCK_MONOTONIC_RAW, &tBeforeProcessFrameCall);
  tsnorm(&tBeforeProcessFrameCall); 
  
  // Substract the bias from the current raw frame
  MatrixXd calFrame = Map<pixDynMat>(myRawFrame, usRawFrameRows, usRawFrameCols).cast<double>() 
                      - biasFrame.cast<double>();

  clock_gettime(CLOCK_MONOTONIC_RAW, &tAfterProcessFrameCall);
  tsnorm(&tAfterProcessFrameCall); 

  cout << " PHI processFrame overhead (ms) = " << (tAfterProcessFrameCall.tv_nsec - tBeforeProcessFrameCall.tv_nsec)/1e6 << endl;

Cheers !

Sylvain

Solution

I have compiled your code (i7-9700K):

Compiler: g++ -O3 -march=native test.cpp -o testbin
====================================================
PHI processFrame overhead (ms) = 0.952253

However, without optimizations:

Compiler: g++ test.cpp -o testbin
====================================================
PHI processFrame overhead (ms) = 20.1365

I suspect that you are missing compiler optimizations. You can try compiling with optimizations enabled. According to the FAQ page, this can easily gain you a factor of ten or more (see http://eigen.tuxfamily.org/index.php?title=FAQ#Optimization).