Search code examples
matlablinear-regressionmultivariate-testing

"out of memory" error for mvregress in matlab


I am trying to use mvregress with the data I have with dimensionality of a couple of hundreds. (3~4). Using 32 gb of ram, I can not compute beta and I get "out of memory" message. I couldn't find any limitation of use for mvregress that prevents me to apply it on vectors with this degree of dimensionality, am I doing something wrong? is there any way to use multivar linear regression via my data?

here is an example of what goes wrong:

dim=400;
nsamp=1000;
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
A=2*eye(dim);
Y=A*X+N;
%without residual term:
A_hat=mvregress(X',Y');
%wit residual term:
[B, y_hat]=mlrtrain(X,Y)

where

function [B, y_hat]=mlrtrain(X,Y)
[n,d] = size(Y);
Xmat = [ones(n,1) X];
Xmat_sz=size(Xmat);
Xcell = cell(1,n);
for i = 1:n
    Xcell{i} = [kron([Xmat(i,:)],eye(d))];
end
[beta,sigma,E,V] = mvregress(Xcell,Y);
B = reshape(beta,d,Xmat_sz(2))';
y_hat=Xmat * B ;
end


the error is:

Error using bsxfun
Out of memory. Type HELP MEMORY for your options.

Error in kron (line 36)
   K = reshape(bsxfun(@times,A,B),[ma*mb na*nb]);

Error in mvregress (line 319)
            c{j} = kron(eye(NumSeries),Design(j,:));

and this is result of whos command:

whos
  Name                  Size                Bytes  Class     Attributes

  A                   400x400             1280000  double              
  N                   400x1000            3200000  double              
  X                   400x1000            3200000  double              
  Y                   400x1000            3200000  double              
  dataVariance          1x1                     8  double              
  dim                   1x1                     8  double              
  mixtureCenters      400x1                  3200  double              
  noiseVariance         1x1                     8  double              
  nsamp                 1x1                     8  double   

Solution

  • Okay, I think I have a solution for you, short version first:

    dim=400;
    nsamp=1000;
    dataVariance = .10;
    noiseVariance = .05;
    mixtureCenters=randn(dim,1);
    X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
    N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
    A=2*eye(dim);
    Y=A*X+N;
    
    [n,d] = size(Y);
    Xmat = [ones(n,1) X];
    Xmat_sz=size(Xmat);
    Xcell = cell(1,n);
    for i = 1:n
        Xcell{i} = kron(Xmat(i,:),speye(d));
    end
    [beta,sigma,E,V] = mvregress(Xcell,Y);
    B = reshape(beta,d,Xmat_sz(2))';
    y_hat=Xmat * B ;
    

    Strangely, I could not access the function's workspace, it did not appear in the call stack. This is why I put the function after the script here.

    Here's the explanation that might also help you in the future: Looking at the kron definition, the result when inserting an m by n and a p by q matrix has size mxp by nxq, in your case 400 by 1001 and 1000 by 1000, that makes a 400000 by 1001000 matrix, which has 4*10^11 elements. Now you have four hundred of them, and each element takes up 8 bytes for double precision, that is a total size of about 1.281 Petabytes of memory (or 1.138 Pebibytes, if you prefer), well out of reach even with your grand 32 Gibibyte.

    Seeing that one of your matrices, the eye one, contains mostly zeros, and the resulting matrix contains all possible element product combinations, most of them will be zero, too. For such cases specifically, MATLAB offers the sparse matrix format, which saves a lot of memory depending on the number of zero elements in a matrix by only storing nonzero ones. You can convert a full matrix to a sparse representation with sparse(X), or you get an eye matrix directly by using speye(n), which is what I did above. The sparse property propagates to the result, which you should now have enough memory for (I have with 1/4 of your memory available, and it works).

    However, what remains is the problem Matthew Gunn mentioned in a comment. I get an error saying:

    Error using mvregress (line 260) Insufficient data to estimate either full or least-squares models.