Search code examples
matlabmultidimensional-arrayindexingdimensions

Why does nansum work for input that exceeds matrix dimensions?


I am wondering about matlab's nansumfunction.

When I use the example from the documentation

X = magic(3);
X([1 6:9]) = repmat(NaN, 1, 5);

X =

   NaN     1   NaN
     3     5   NaN
     4   NaN   NaN

and then call

>> nansum(X, 1)

ans =

     7     6     0

>> nansum(X, 2)

ans =

     1
     8
     4

it works as expected.

However, what I did not expect is that it also works for

>> nansum(X, 400)

ans =

     0     1     0
     3     5     0
     4     0     0

What is the reasoning here? Why wouldn't this crash with the error that dimexceeds the matrix dimensions?


Solution

  • In MATLAB, all arrays/matrices have infinite singleton trailing dimensions.

    A singleton dimension is a dimension, dim, where size(A,dim) = 1. It's called a trailing singleton dimension when it comes after all non-singleton dimensions (i.e. it doesn't change the structure of the matrix).

    Any function (including nansum) which can operate on a specific dimension can do so on any one of the infinite singleton dimensions. Often you wont see any affect (for instance using max or sum in this way simply returns the inputs[1]), but nansum replaces NaN with zero, so that's all that happens.

    Note that nansum(A,dim) is the same as sum(A,dim,'omitnan'). You can see this by typing edit nansum. So my example uses sum for ease. See the bottom of this answer for references about defined behaviour.

    Let's try to visualise this:

    A = ones(3,4);
    size( A ) % >> ans = [3, 4]
    % Under the hood:
    % size( A ) = [3, 4, 1, 1, 1, 1, ...]
    sum( A, 1 )   % Sum through the rows, or the 1st dimension, which has 3 elements per sum
                  % >> ans = [3 3 3 3]
    sum( A, 2 )   % Sum through the columns, or the 2nd dimension, which has 4 elements per sum
                  % >> ans = [4; 4; 4]
    sum( A, 400 ) % Sum through the ???, the 400th dimension, which has 1 element per sum
                  % >> ans = [1 1 1 1; 1 1 1 1; 1 1 1 1]
    

    If you wanted, you could reshape the original matrix to have singleton 2nd through 399th dimensions to further this:

    % Set up dimensions as [3, 1, 1, ..., 1, 1, 4], for a 400-D array!
    dims = num2cell( [3 ones(1,398), 4] );
    % Note we'll now still have trailing singleton dims, but have 398 in the structure too
    B = reshape( A, dims{:} ); 
    

    Now we can do a similar sum example. The final thing to know is that squeeze removes non-trailing singleton dimensions, we can use this to tidy up the outputs:

    sum( B, 1 ); % >> ans(:,:,1,1,1,...,1) = 3 
                 % >> ans(:,:,1,1,1,...,2) = 3
                 % >> ans(:,:,1,1,1,...,3) = 3
                 % >> ans(:,:,1,1,1,...,4) = 3
    squeeze( sum( B, 1 ) ); % >> ans = [3; 3; 3; 3] 
    
    % similarly  
    squeeze( sum( B, 2 ) );   % >> ans = [1 1 1 1; 1 1 1 1; 1 1 1 1]
    squeeze( sum( B, 400 ) ); % >> ans = [4; 4; 4]
    

    We can see that, now we've reshaped things, summing in the 400th dimension does the same as originally summing in the 2nd dimension and vice-versa. This would be easier to visualise if you replaced 400 with 3!


    [ 1 ] See the sum and max documentation as examples where the behaviour is explicitly defined "if dim is greater than ndims(A)." In both cases the implementation is made more efficient by just returning A. In the case of nansum there has to be some computation in case elements are NaN.