c++performance matlab mex function-handle

Matlab mex-file with mexCallMATLAB is almost 300 times slower than the corresponding m-file

I started implementing a few m-files in C++ in order to reduce run times. The m-files produce n-dimensional points and evaluate function values at these points. The functions are user-defined and they are passed to m-files and mex-files as function handles. The mex-files use mexCallMATLAB with feval for finding function values.

I constructed the below example where a function handle fn constructed in the Matlab command line is passed to matlabcallingmatlab.m and mexcallingmatlab.cpp routines. With a freshly opened Matlab, mexcallingmatlab evaluates this function 200000 in 241.5 seconds while matlabcallingmatlab evaluates it in 0.81522 seconds therefore a 296 times slow-down with the mex implementation. These times are the results of the second runs as the first runs seem to be larger probably due to some overhead associated first time loading the program etc.

I have spent many days searching online on this problem and tried some suggestions on it. I tried different mex compiling flags to optimize the mex but there was almost no difference in performance. A previous post in Stackoverflow stated that upgrading Matlab was the solution but I am using probably the latest version MATLAB Version: 8.1.0.604 (R2013a) on Mac OS X Version: 10.8.4. I did compile the mex file with and without –largeArrayDims flag but this didn’t make any difference either. Some suggested that the content of the function handle could be directly coded in the cpp file but this is impossible as I would like to provide this code to any user with any type of function with a vector input and real number output.

As far as I found out, mex files need to go through feval function for using a function handle whereas m-files can directly call function handles provided that Matlab version is newer than some version.

Any help would be greatly appreciated.

simple function handle created in the Matlab command line:

fn = @(x) x'*x

matlabcallingmatlab.m :

function matlabcallingmatlab( fn )
x = zeros(2,1); 
for i = 0 : 199999
    x(2) = i; 
    f = fn( x ); 
end

mexcallingmatlab.cpp:

#include "mex.h"
#include <cstring>

void mexFunction( int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[] )
{
    mxArray *lhs[1], *rhs[2]; //parameters to be passed to feval
    double f, *xptr, x[] = {0.0, 0.0}; // x: input to f and f=f(x)
    int n = 2, nbytes = n * sizeof(double);  // n: dimension of input x to f

    // prhs[0] is the function handle as first argument to feval
    rhs[0] = const_cast<mxArray *>( prhs[0] );

    // rhs[1] contains input x to the function
    rhs[1] = mxCreateDoubleMatrix( n, 1, mxREAL);
    xptr = mxGetPr( rhs[1] );

    for (int i = 0; i < 200000; ++i)
    {
        x[1] = double(i);   // change input 
        memcpy( xptr, x, nbytes );  // now rhs[1] has new x
        mexCallMATLAB(1, lhs, 2, rhs, "feval");
        f = *mxGetPr( lhs[0] );
    }
}

Compilation of mex file:

>> mex -v -largeArrayDims mexcallingmatlab.cpp

Solution

So I tried to implement this myself, and I think I found the reason for the slowness.

Basically your code have a small memory leak where you are not freeing the lhs mxArray returned from the call to mexCallMATLAB. It is not exactly a memory-leak, seeing that MATLAB memory manager takes care of freeing the memory when the MEX-file exits:

MATLAB allocates dynamic memory to store the mxArrays in plhs. MATLAB automatically deallocates the dynamic memory when you clear the MEX-file. However, if heap space is at a premium, call mxDestroyArray when you are finished with the mxArrays plhs points to.

Still explicit is better than implicit... So your code is really stressing the deallocator of the MATLAB memory manager :)

mexcallingmatlab.cpp

#include "mex.h"

#ifndef N
#define N 100
#endif

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // validate input/output arguments
    if (nrhs != 1) {
        mexErrMsgTxt("One input argument required.");
    }
    if (mxGetClassID(prhs[0]) != mxFUNCTION_CLASS) {
        mexErrMsgTxt("Input must be a function handle.");
    }
    if (nlhs > 1) {
        mexErrMsgTxt("Too many output arguments.");
    }

    // allocate output
    plhs[0] = mxCreateDoubleMatrix(N, 1, mxREAL);
    double *out = mxGetPr(plhs[0]);

    // prepare for mexCallMATLAB: val = feval(@fh, zeros(2,1))
    mxArray *lhs, *rhs[2];
    rhs[0] = mxDuplicateArray(prhs[0]);
    rhs[1] = mxCreateDoubleMatrix(2, 1, mxREAL);
    double *xptr = mxGetPr(rhs[1]) + 1;

    for (int i=0; i<N; ++i) {
        *xptr = i;
        mexCallMATLAB(1, &lhs, 2, rhs, "feval");
        out[i] = *mxGetPr(lhs);
        mxDestroyArray(lhs);
    }

    // cleanup
    mxDestroyArray(rhs[0]);
    mxDestroyArray(rhs[1]);
}

MATLAB

fh = @(x) x'*x;
N = 2e5;

% MATLAB
tic
out = zeros(N,1);
for i=0:N-1
    out(i+1) = feval(fh, [0;i]);
end
toc

% MEX
mex('-largeArrayDims', sprintf('-DN=%d',N), 'mexcallingmatlab.cpp')
tic
out2 = mexcallingmatlab(fh);
toc

% check results
assert(isequal(out,out2))

Running the above benchmark a couple of times (to warm it up), I get the following consistent results:

Elapsed time is 0.732890 seconds.    % pure MATLAB
Elapsed time is 1.621439 seconds.    % MEX-file

No where near the slow times you initially had! Still the pure MATLAB part is about twice as fast, probably because of the overhead of calling an external MEX-function.

(My system: Win8 running 64-bit R2013a)