I am wondering about matlab's nansum
function.
When I use the example from the documentation
X = magic(3);
X([1 6:9]) = repmat(NaN, 1, 5);
X =
NaN 1 NaN
3 5 NaN
4 NaN NaN
and then call
>> nansum(X, 1)
ans =
7 6 0
>> nansum(X, 2)
ans =
1
8
4
it works as expected.
However, what I did not expect is that it also works for
>> nansum(X, 400)
ans =
0 1 0
3 5 0
4 0 0
What is the reasoning here? Why wouldn't this crash with the error that dim
exceeds the matrix dimensions?
In MATLAB, all arrays/matrices have infinite singleton trailing dimensions.
A singleton dimension is a dimension, dim
, where size(A,dim) = 1
. It's called a trailing singleton dimension when it comes after all non-singleton dimensions (i.e. it doesn't change the structure of the matrix).
Any function (including nansum
) which can operate on a specific dimension can do so on any one of the infinite singleton dimensions. Often you wont see any affect (for instance using max
or sum
in this way simply returns the inputs[1]), but nansum
replaces NaN
with zero, so that's all that happens.
Note that nansum(A,dim)
is the same as sum(A,dim,'omitnan')
. You can see this by typing edit nansum
. So my example uses sum
for ease. See the bottom of this answer for references about defined behaviour.
Let's try to visualise this:
A = ones(3,4);
size( A ) % >> ans = [3, 4]
% Under the hood:
% size( A ) = [3, 4, 1, 1, 1, 1, ...]
sum( A, 1 ) % Sum through the rows, or the 1st dimension, which has 3 elements per sum
% >> ans = [3 3 3 3]
sum( A, 2 ) % Sum through the columns, or the 2nd dimension, which has 4 elements per sum
% >> ans = [4; 4; 4]
sum( A, 400 ) % Sum through the ???, the 400th dimension, which has 1 element per sum
% >> ans = [1 1 1 1; 1 1 1 1; 1 1 1 1]
If you wanted, you could reshape
the original matrix to have singleton 2nd through 399th dimensions to further this:
% Set up dimensions as [3, 1, 1, ..., 1, 1, 4], for a 400-D array!
dims = num2cell( [3 ones(1,398), 4] );
% Note we'll now still have trailing singleton dims, but have 398 in the structure too
B = reshape( A, dims{:} );
Now we can do a similar sum
example. The final thing to know is that squeeze
removes non-trailing singleton dimensions, we can use this to tidy up the outputs:
sum( B, 1 ); % >> ans(:,:,1,1,1,...,1) = 3
% >> ans(:,:,1,1,1,...,2) = 3
% >> ans(:,:,1,1,1,...,3) = 3
% >> ans(:,:,1,1,1,...,4) = 3
squeeze( sum( B, 1 ) ); % >> ans = [3; 3; 3; 3]
% similarly
squeeze( sum( B, 2 ) ); % >> ans = [1 1 1 1; 1 1 1 1; 1 1 1 1]
squeeze( sum( B, 400 ) ); % >> ans = [4; 4; 4]
We can see that, now we've reshaped things, summing in the 400th dimension does the same as originally summing in the 2nd dimension and vice-versa. This would be easier to visualise if you replaced 400 with 3!
[ 1 ] See the sum
and max
documentation as examples where the behaviour is explicitly defined "if dim is greater than ndims(A)
." In both cases the implementation is made more efficient by just returning A
. In the case of nansum
there has to be some computation in case elements are NaN
.