Search code examples
performancematlabtimekernelfunction-handle

MATLAB optimization: speed up computation on large matrices


I am using the following function:

kernel = @(X,Y,sigma) exp((-pdist2(X,Y,'euclidean').^2)./(2*sigma^2));

to compute a series of kernels, in the following way:

K = [(1:size(featureVectors,1))', kernel(featureVectors,featureVectors, sigma)];

However, since featureVectors is a huge matrix (something like 10000x10000), it takes really a long time to compute the kernels (e.g., K).

Is it possible to somehow speed up the computation?


EDIT: Context

I am using a classifier via libsvm, with a gaussian kernel, as you may have noticed from the variable names and semantics.

I am using now (more or less) #terms~=10000 and #docs~=10000. This #terms resulted after stopwords removal and stemming. This course indicates that having 10000 features makes sense.

Unfortunately, libsvm does not implement automatically the Gaussian kernel. Thus, it is required to compute it by hand. I took the idea from here, but the kernel computation (as suggested by the referenced question) is really slow.


Solution

  • You are using pdist2 with two equal input arguments (X and Y are equal when you call kernel). You could save half the time by computing each pair only once. You do that using pdist and then squareform:

    kernel = @(X,sigma) exp((-squareform(pdist(X,'euclidean')).^2)./(2*sigma^2));
    K = [(1:size(featureVectors,1))', kernel(featureVectors, sigma)];