Search code examples
matlabaveragecell-array

MATLAB Cell Array - Average two values if another column matches


I have a cell array in which some of the entries have two data points. I want to average the two data points if the data were collected on the same day.

The first column of cell array 'site' is the date. The fourth column is the data concentration. I want to average the fourth column if the data comes from the same day.

For example, if my cell array looks like this:

01/01/2011  36-061-0069   1   10.4
01/01/2011  36-061-0069   2   10.1
01/04/2011  36-061-0069   1   7.9
01/05/2011  36-061-0069   1   13

I want to average the fourth column (10.4 and 10.1) into one row and leave everything else the same.

Help? Would an if elseif loop work? I'm not sure how to approach this issue, especially since cell arrays work a little differently than matrices.


Solution

  • You can do it succinctly without a loop, using a combination of unique, diff and accumarray.

    Define data:

    data = {'01/01/2011'  '36-061-0069'  '1'  '10.4';
            '01/01/2011'  '36-061-0069'  '2'  '10.1';
            '01/04/2011'  '36-061-0069'  '1'  '7.9';
            '01/05/2011'  '36-061-0069'  '1'  '13'};
    

    Then:

    dates = datenum(data(:,1),2); % mm/dd/yyyy format. Change "2" for other formats
    [dates_sort ind_sort] = sort(dates);
    [~, ii, jj] = unique(dates_sort);
    n = diff([0; ii]);
    result = accumarray(jj,vertcat(str2double(data(ind_sort,4))))./n;
    

    gives the desired result:

    result =
    
       10.2500
        7.9000
       13.0000
    

    If needed, you can get the non-repeated, sorted dates with data(ind_sort(ii),1).

    Explanation of the code: the dates are first converted to numbers and sorted. The unique dates and repeated dates are then extracted. Finally, data in repeated rows are summed and divided by the number of repetitions to obtain the averages.

    Compatibility issues for Matlab 2013a onwards:

    The function unique has changed in Matlab 2013a. For that version onwards, add 'legacy' flag to unique, i.e. replace the line [~, ii, jj] = unique(dates_sort) by

    [~, ii, jj] = unique(dates_sort,'legacy')