I am trying to use mvregress with the data I have with dimensionality of a couple of hundreds. (3~4). Using 32 gb of ram, I can not compute beta and I get "out of memory" message. I couldn't find any limitation of use for mvregress that prevents me to apply it on vectors with this degree of dimensionality, am I doing something wrong? is there any way to use multivar linear regression via my data?
here is an example of what goes wrong:
dim=400;
nsamp=1000;
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
A=2*eye(dim);
Y=A*X+N;
%without residual term:
A_hat=mvregress(X',Y');
%wit residual term:
[B, y_hat]=mlrtrain(X,Y)
where
function [B, y_hat]=mlrtrain(X,Y)
[n,d] = size(Y);
Xmat = [ones(n,1) X];
Xmat_sz=size(Xmat);
Xcell = cell(1,n);
for i = 1:n
Xcell{i} = [kron([Xmat(i,:)],eye(d))];
end
[beta,sigma,E,V] = mvregress(Xcell,Y);
B = reshape(beta,d,Xmat_sz(2))';
y_hat=Xmat * B ;
end
Error using bsxfun
Out of memory. Type HELP MEMORY for your options.
Error in kron (line 36)
K = reshape(bsxfun(@times,A,B),[ma*mb na*nb]);
Error in mvregress (line 319)
c{j} = kron(eye(NumSeries),Design(j,:));
and this is result of whos command:
whos
Name Size Bytes Class Attributes
A 400x400 1280000 double
N 400x1000 3200000 double
X 400x1000 3200000 double
Y 400x1000 3200000 double
dataVariance 1x1 8 double
dim 1x1 8 double
mixtureCenters 400x1 3200 double
noiseVariance 1x1 8 double
nsamp 1x1 8 double
Okay, I think I have a solution for you, short version first:
dim=400;
nsamp=1000;
dataVariance = .10;
noiseVariance = .05;
mixtureCenters=randn(dim,1);
X=randn(dim, nsamp)*sqrt(dataVariance ) + repmat(mixtureCenters,1,nsamp);
N=randn(dim, nsamp)*sqrt(noiseVariance ) + repmat(mixtureCenters,1,nsamp);
A=2*eye(dim);
Y=A*X+N;
[n,d] = size(Y);
Xmat = [ones(n,1) X];
Xmat_sz=size(Xmat);
Xcell = cell(1,n);
for i = 1:n
Xcell{i} = kron(Xmat(i,:),speye(d));
end
[beta,sigma,E,V] = mvregress(Xcell,Y);
B = reshape(beta,d,Xmat_sz(2))';
y_hat=Xmat * B ;
Strangely, I could not access the function's workspace, it did not appear in the call stack. This is why I put the function after the script here.
Here's the explanation that might also help you in the future:
Looking at the kron
definition, the result when inserting an m by n and a p by q matrix has size mxp by nxq, in your case 400 by 1001 and 1000 by 1000, that makes a 400000 by 1001000 matrix, which has 4*10^11 elements. Now you have four hundred of them, and each element takes up 8 bytes for double precision, that is a total size of about 1.281 Petabytes of memory (or 1.138 Pebibytes, if you prefer), well out of reach even with your grand 32 Gibibyte.
Seeing that one of your matrices, the eye one, contains mostly zeros, and the resulting matrix contains all possible element product combinations, most of them will be zero, too. For such cases specifically, MATLAB offers the sparse matrix format, which saves a lot of memory depending on the number of zero elements in a matrix by only storing nonzero ones. You can convert a full matrix to a sparse representation with sparse(X)
, or you get an eye matrix directly by using speye(n)
, which is what I did above. The sparse property propagates to the result, which you should now have enough memory for (I have with 1/4 of your memory available, and it works).
However, what remains is the problem Matthew Gunn mentioned in a comment. I get an error saying:
Error using mvregress (line 260) Insufficient data to estimate either full or least-squares models.