Search code examples
matlabmatrixout-of-memorynormalizationsparse-matrix

Efficiently scaling column of a sparse matrix in matlab


I have a sample X which is a sparse matrix (~5%) and now try to scale each column with a factor (basically tf-idf normalization).

Which I thought is a task easy to accomplish somehow now occurs to be not really supported. Here is what I used:

fac = log(size(X,1)./max(1,sum(X ~= 0)));
X = bsxfun(@times,X,fac); % this line gives an out of memory error

X is around 20,000x1,000,000 but only ~ 5% of the features are nonzero thus there shouldn't be any problem memorywise (the machine has 48 GB Ram and could easily handle a full matrix with the same number of elements allocated).

Actually I feel that there must be an easy way to do this, as it is a very common operation with sparse matrices holding data samples.

Thanks in advance


Solution

  • Yey for linear algebra! Column scaling is right multiplication of diagonal matrix:

    X = X*diag(sparse(fac));