Search code examples
matlabpivot-tableaverageaccumarray

How to accumulate (average) data based on multiple criteria


I have a set of data where I have recorded values in sets of 3 readings (so as to be able to obtain a general idea of the SEM). I have them recorded in a list that looks as follows, which I am trying to collapse into averages of each set of 3 points:

Original Table

I want to collapse essentially each 3 rows into one row where the average data value is given for that set. In essence, it would look as follows:

Desired result

This is something I know how to do basically in Excel (i.e. using a Pivot table) but I am not sure how to do the same in MATLAB. I have tried using accumarray but struggle with knowing how to incorporate multiple conditions essentially. I would need to create a subs array where its number corresponds to each unique set of 3 data points. By brute force, I could create an array such as:

subs = [1 1 1; 2 2 2; 3 3 3; 4 4 4; ...]'

using some looping and have that as my subs array, but since it isn't tied to the data itself, and there may be strange hiccups throughout (i.e. more than 3 data points per set, or missing data, etc.). I know there must be some way to have this sort of Pivot-table-esque grouping for something like this, but need some help to get it off the ground. Thanks.

Here is the input data in text form:

Subject  Flow   On/Off   Values
1        10     1        2.20
1        10     1        2.50
1        10     1        2.60
1        20     1        5.50
1        20     1        6.10
1        20     1        5.90
1        30     1        10.10
1        30     1        10.50
1        30     1        10.50
1        10     0        1.90
1        10     0        2.20
1        10     0        2.30
1        20     0        5.20
1        20     0        5.80
1        20     0        5.60
1        30     0        9.80
1        30     0        10.20
1        30     0        10.20
2        10     1        5.70
2        10     1        6.00
2        10     1        6.10
2        20     1        9.00
2        20     1        9.60
2        20     1        9.40
2        30     1        13.60
2        30     1        14.00
2        30     1        14.00
2        10     0        5.40
2        10     0        5.70
2        10     0        5.80
2        20     0        8.70
2        20     0        9.30
2        20     0        9.10
2        30     0        13.30
2        30     0        13.70
2        30     0        13.70

Solution

  • I assume that

    • You want to average based on unique values of the first three columns (not on groups of three rows, although the two criteria coincide in your example);
    • Order is determined by column 1, then 3, then 2.

    Then, denoting your data as x,

    [~, ~, subs] = unique(x(:, [1 3 2]), 'rows', 'sorted');
    result = accumarray(subs, x(:,end), [], @mean);
    

    gives

    result =
        2.1333
        5.5333
       10.0667
        2.4333
        5.8333
       10.3667
        5.6333
        9.0333
       13.5667
        5.9333
        9.3333
       13.8667
    

    As you see, I am using the third output of unique with the 'rows' and 'sorted' options. This creates the subs grouping vector based on first three columns of your data in the desired order. Then, passing that to accumarray computes the means.