Search code examples
arraysmatlabmatrixmultidimensional-arrayaccumarray

Matlab Accumarray for 3D matrix


I really wish the documentation was more clear on the use of accumarray for N-D size matrices in Matlab. Regardless, I'm totally confused here and looking for some advice. Here's the challenge:

I have a 3D matrix of data.

  • ROWS are individual rivers
  • COLUMNS are dates of observations
  • PAGES are the time intervals of data collection

For this example, let's assume each observation is volume of water flowing through the meter in a 5 minute interval. However, I now wish to resample the data to intervals of N minutes (obviously multiples of 5). Let's choose NMINS = 15.

So, what I want to do is find the sum or mean of the volume of water over NMINS minute intervals. That is, the ROWS and COLUMNS will not change, but the dimensions and values for the PAGES will be decimated/aggregated.

I can get the grouping values for the pages. That is, the values I need to group by. If it was a single river for a single day, no problem. But, I have hundreds of days and dozens of rivers.

I've gotten this far:

CREATE TEST TIME VECTOR

NMINS   = 15; % Bucket by every 15 mins or 20, etc.
N5MINS  = 5 * 12 * 24 * 2; % Keep small - Two days of 5 min datenums
dnums   = datenum(2016,3,20,0,0:5:N5MINS,0);
% Trim dnums to start at random time for edge case and keep only mins
mins    = rem(dnums(25:end-30),1)';    % Create column vector

CREATE RANDOM MATRIX FOR TESTING

rng(123); % Set seed for reproducibility
X       = randi(100,12,9,length(mins)); % Test matrix

FIND TIMES IN TERMS OF MINUTES

[~,~,~,H,M] = datevec( mins );
H       = 60 .* (H - H(1));

NOW FIND ALL TIMES THAT CORRESPOND TO OUR BUCKET

idxMIN  = mod( M+H, NMINS )==0;
idxNewP = idxMIN;          % This is used to align the new river matrix
[R,C,P] = size(X);         % We'll drop P and use newP
newP    = sum(idxNewP); % Number of PAGES in final matrix (new)
% Final newX will have dimensions [R,C,newP]

RESET THE GROUPING INDICES

% Must shift one all as minute intervals represent data UP to that point
% for the actual grouping of data. Test if first bucket is a 
% match and set accordingly
if idxMIN(1)
    idxMIN = [1; idxMIN(1:end-1)]; 
    subs = cumsum(idxMIN);
else 
    idxMIN = [0; idxMIN(1:end-1)];
    subs = cumsum(idxMIN) + 1 ;
end

ADDITION: The group size will not be consistent and I cannot make that assumption (sadly). Consider the following after running the above.

tsttbl = table();
tsttbl.dnumstr = datestr(mins(1:5));
tsttbl.Mins    = M(1:5);
tsttbl.subs    = subs(1:5);
tsttbl

Output shows first group is a size of 1:

tsttbl = 

    dnumstr     Mins    subs
    ________    ____    ____

     2:00 AM     0      1   
     2:05 AM     5      2   
     2:10 AM    10      2   
     2:15 AM    15      2   
     2:20 AM    20      3  

Last, ultimately, I'll need to pass custom functions. The above is a toy example to illustrate the problem quickly. My apologies for not being more clear.

END ADDITION

And this is where I stumble...

How do I set the subs values to apply along each page to use accumarray? I'm totally confused by the documentation. Or is this actually the wrong approach? For what it's worth I'm using Matlab 2015b.

Honestly, any help would be greatly appreciated.

ALTERNATE SOLUTION This hit me on the way home. Meshgrid is my friend...

Once the cells above have been run (note I changed the size of the matrix X), we create the indices for the entire matrix where the "indices" for the pages (i.e., times) are given by the values in "subs". To do this, I use meshgrid.

[days,rivers,pages] = meshgrid(1:C,1:R,subs);
grpvals = [rivers(:) days(:) pages(:)];
tst     = accumarray(grpvals,X(:),[R C newP],@sum);

Probably not the most memory efficient as I have to create essentially the days, rivers, and pages matrices, then wind up creating a new grpvals array of those. But, it has the advantage that now I can use accumarray and pass anonymous functions, @std, etc.

Hope this helps others!

Huge thanks to Luis.


Solution

  • If all groups have the same size

    You can do the aggregation as follows:

    1. reshape along a 4th dimension to build the groups you want to aggregate. The 3rd dimension now refers to elements of each group, and the 4th dimension refers to groups.
    2. sum along the 3rd dimension (each group).
    3. squeeze out the now-singleton 3rd dimension to recover a 3D array.

    Code:

    X = randi(9,2,3,6); %// example data. 3D array.
    G = 2; %// group size along 3rd dim. Divides size(X,3)
    result = squeeze(sum(reshape(X, size(X,1), size(X,2), G, []), 3));
    

    For example, with G = 2,

    X(:,:,1) =
         2     3     9
         4     5     9
    X(:,:,2) =
         3     8     2
         6     9     8
    X(:,:,3) =
         4     4     4
         1     1     7
    X(:,:,4) =
         9     9     8
         2     4     1
    X(:,:,5) =
         9     5     9
         3     5     8
    X(:,:,6) =
         9     1     3
         5     3     1
    

    gives

    result(:,:,1) =
         5    11    11
        10    14    17
    result(:,:,2) =
        13    13    12
         3     5     8
    result(:,:,3) =
        18     6    12
         8     8     9
    

    General case: groups with possibly different sizes

    Since accumarray doesn't work with a multidimensional array (or even a matrix) as second input, you can use matrix multiplication along the lines of this answer. For that you need to pack the first two dimensions of your 3D array into one dimension (which will be unpacked at the end), and from the group indices build a zero-one matrix that will give the desired result through matrix multiplication.

    X = randi(9,2,3,5); %// example data. 3D array.
    subs = [1 2 2 1 1]; %// indices of groups. Groups may differ in size, and indices
                        %// need not be sorted
    Y = reshape(X, [], size(X,3)); %// reshape into a matrix. Groups are along rows
    M = full(sparse(1:numel(subs), subs, 1)); %// indicator matrix from group indices
    result = reshape(Y*M, size(X,1), size(X,2), []); %// compute result and reshape 
    

    For example,

    X(:,:,1) =
         9     3     8
         6     8     8
    X(:,:,2) =
         3     8     3
         7     2     2
    X(:,:,3) =
         7     3     6
         2     8     5
    X(:,:,4) =
         7     4     5
         8     8     6
    X(:,:,5) =
         2     3     2
         2     8     8
    
    subs =
         1     2     2     1     1
    

    gives

    result(:,:,1) =
        18    10    15
        16    24    22
    result(:,:,2) =
        10    11     9
         9    10     7