Search code examples
matlabstructvectorization

summing fields in struct arrays that have the same inner structure


I have many mat files, each contains a struct array s that has the same inner structure. Here is a minimal example for one of these s structure array that you'd get from loading a single file:

s(1).A.a=rand(3);
s(1).A.b=rand(4);
s(1).B  =1;

s(2).A.a=rand(3);
s(2).A.b=rand(4);
s(2).B  =10;

in practice the structure array has 100's of elements, and tens of fields and sub-fields. Please dont comment about the choice of saving the files the way they are. it is not in my control, and the question here is about how to deal with the information in these files.

I would like eventually to average all the information of each of the sub-fields of these structure arrays, so for that a logical step is to sum them (and then divide by the number of files) .

A solution I have at the moment is this:

% initialize arrays of the same inner structure as `s`      

 sum_s_A_a=zeros(size(s(1).A.a,1),size(s(1).A.a,2),numel(s));
 sum_s_A_b=zeros(size(s(1).A.b,1),size(s(1).A.b,2),numel(s));
 sum_s_B=zeros(1,numel(s));

 for jj=1:100 % loop over all 100 files (just for the example) 
   
       %  load here each file that contains s
    
      for ii=1:numel(s) ; % loop each element in s and add it to sum_s
        sum_s_A_a(:,:,ii) = sum_s_A_a(:,:,ii)  + s(ii).A.a;
        sum_s_A_b(:,:,ii) = sum_s_A_b(:,:,ii)  + s(ii).A.b;
        sum_s_B(ii) =  sum_s_B(ii) + s(ii).B;
      end

  end

This is extremely not practical as there are dozens of fields and sub-fields in s, but the minimal example above works for a the "single file" case if you use s as defined above

I'd like to just sum the information over all these files in a similar way to the for loop above, but without writing down and hard code all the names of fields and sub-fields into array names, and if possible without the for loop.

I don't mind if the final container for the information is struct, cell or arrays.


Solution

  • Starting with your example in order to sum all A fields and to return a numeric array here is a way that doesn't use loop:

    function result = sumstruct (varargin)
      v = [varargin{:}];
      s = [v.A];
      result = sum([s.tot1], 2);
    end
    

    and call it as:

    result = sumstruct (s1, s2, s3);
    

    EDIT:

    However if you want also to sum other fields and their sub-fields and combine them to a struct you need to use loop or cellfun. Here is a solution that recursively reduces a nested structure:

    function result = reduce(fcn, varargin)
      fcns0 = {@(x)cat(3, x{:}), @(x)x};
      switcher0 = @(tf, s)fcns0{tf+1}(s);
      fcns = {@(s)fcn(s), @(s)reduce(fcn, s{:})};
      switcher = @(tf, s)fcns{tf+1}(s);
      c = cellfun(@(x){struct2cell(x)}, varargin);
      s0 = cat(3, c{:});
      s1 = reshape(s0, [], numel(varargin));
      s2 = cellfun(@(x){switcher0(isstruct(x{1}), x)}, num2cell(s1, 2));
      s3 = reshape(s2, size(c{1}));
      s4 = cellfun(@(c){switcher(iscell(c), c)}, s3);
      fnames = fieldnames(varargin{1});
      result = cell2struct(s4, fnames, 1);
    end
    

    The first argument is a function handle to be used for the reduction and the remaining arguments are struct arrays.

    Use a loop to load all of files and use reduce:

    c = cell (1, 100);
    for i = 1:100
      c{i} = load('file');
    end
    result = reduce(@(x)sum(x, 3), c{:});
    result = reduce(@(x)x ./ 100, result);
    

    Alternatively you can incrementally load files and perform reduce:

    result = [];
    for i = 1:100
      s = load('file');
      if i == 1
        result = s;
      else
        result = reduce(@(x)sum(x, 3), result, s);
      end
    end  
    result = reduce(@(x)x ./ 100, result);
    

    Note that here the reduction function should perform along the third dimension of an array and because of that it has been written as sum(x, 3).