Search code examples
arraysexcelarray-formulasstandard-deviation

Excel: Solve the Standard Deviation of the Difference Between Arrays (Averages of Multiple Columns)


I have a matrix:

A   B   C   D   E   F
4   10  2   2   1   1
            3   7   9
2   6   6           
2       10  2   0   4
2   0   1       6   5

2   6   4   5   0   6
3               8   
5   1   10  2   2   8

I want to find the standard deviation (STDEV.S) of the row-wise differences between the averages of columns A:C and D:F. Critically, I only want to calculate this for rows that have full entries in columns A:C and D:F (e.g., no blanks).

In this example, the standard deviation would be 1.9. Let's break this down. First, we determine the row-wise average for columns A:C and D:F (see columns G and H below). Notice that this has only been calculated for rows 1, 7, and 9 because they are the only rows that have complete data in columns A:F:

A   B   C   D   E   F   G   H
4   10  2   2   1   1   5.3 1.3
            3   7   9       
2   6   6                   
2       10  2   0   4       
2   0   1       6   5       

2   6   4   5   0   6   4   3.7
3               8           
5   1   10  2   2   8   5.3 4

Then we take the row-wise difference between columns G and H (see column I):

A   B   C   D   E   F   G   H   I
4   10  2   2   1   1   5.3 1.3 4
            3   7   9           
2   6   6                       
2       10  2   0   4           
2   0   1       6   5           

2   6   4   5   0   6   4   3.7 0.3
3               8               
5   1   10  2   2   8   5.3 4   1.3

And finally calculate the standard deviation of column I, which is 1.9.

I am looking to achieve this within a single array formula. I have already attempted the following but failed:

Attempt 1:

{=STDEV.S(AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A9)-ROW(A1),0,1,COLUMNS(A1:C1)))=COLUMNS(A1:A1),IF(SUBTOTAL(2,OFFSET(D1,ROW(D1:D9)-ROW(D1),0,1,COLUMNS(D1:F1)))=COLUMNS(D1:F1),SUBTOTAL(1,OFFSET(A1,ROW(A1:A9)-ROW(A1),0,1,COLUMNS(A1:C1)))))))-(AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A9)-ROW(A1),0,1,COLUMNS(A1:C1)))=COLUMNS(A1:C1),IF(SUBTOTAL(2,OFFSET(D1,ROW(D1:D9)-ROW(D1),0,1,COLUMNS(D1:F1)))=COLUMNS(D1:F1),SUBTOTAL(1,OFFSET(D1,ROW(D1:D9)-ROW(D1),0,1,COLUMNS(D1:F1)))))))}

Results in a #DIV/0! warning after entering with Ctrl + Shift + Enter

Attempt 2:

I successfully computed the standard deviation of the difference between two columns (rather than two averaged columns) using the following formula (in this case, we are interested in the SD of the difference between columns A and B):

{=STDEV(IF(A1:A9<>"",IF(B1:B9<>"",A1:A9-B1:B9)))}

I couldn't quite adapt this code myself but perhaps someone may find it useful.

Any suggestions would be greatly appreciated.


Solution

  • The main issue is that you are using SUBTOTAL(1... to get the average of columns A-C and separately for columns D-F, but then also using AVERAGE to get the average of the averages - a step too far. The result is that you try and take the stdev of a single number which gives you #DIV/0!

    If you take those out and make a couple of minor adjustments you get

    =STDEV.S(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A9)-ROW(A1),0,1,COLUMNS(A1:C1)))=COLUMNS(A1:C1),IF(SUBTOTAL(2,OFFSET(D1,ROW(D1:D9)-ROW(D1),0,1,COLUMNS(D1:F1)))=COLUMNS(D1:F1),SUBTOTAL(1,OFFSET(A1,ROW(A1:A9)-ROW(A1),0,1,COLUMNS(A1:C1)))))-IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A9)-ROW(A1),0,1,COLUMNS(A1:C1)))=COLUMNS(A1:C1),IF(SUBTOTAL(2,OFFSET(D1,ROW(D1:D9)-ROW(D1),0,1,COLUMNS(D1:F1)))=COLUMNS(D1:F1),SUBTOTAL(1,OFFSET(D1,ROW(D1:D9)-ROW(D1),0,1,COLUMNS(D1:F1))))))
    

    with result 1.3. Why is this the wrong answer? Unfortunately when you get a blank cell in either cols A-C or D-F, your If statements evaluate to FALSE and you end up in some cases with FALSE-FALSE which evaluates to 0 and gives you a numeric value which is included by STDEV and gives the wrong result.

    You can solve it by re-factoring the formula, in words

    If first block is complete
        If second block is complete
            Calculate and subtract means
    Take Stdev of result
    

    which gives

    =STDEV.S(
    IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A9)-ROW(A1),0,1,COLUMNS(A1:C1)))=COLUMNS(A1:C1),
         IF(SUBTOTAL(2,OFFSET(D1,ROW(D1:D9)-ROW(D1),0,1,COLUMNS(D1:F1)))=COLUMNS(D1:F1),
              SUBTOTAL(1,OFFSET(A1,ROW(A1:A9)-ROW(A1),0,1,COLUMNS(A1:C1)))-SUBTOTAL(1,OFFSET(D1,ROW(A1:A9)-ROW(A1),0,1,COLUMNS(D1:F1)))))
    )