Search code examples
google-sheetsgoogle-sheets-formulaaveragearray-formulasgoogle-query-language

ArrayFormula of Average on Infinite Truly Dynamic Range in Google Sheets


as per example:

     A       B      C     D     E     F     G     ∞
  |======|=======|=====|=====|=====|=====|=====|=====
1 |      |AVERAGE|     |     |     |     |     |        
  |======|=======|=====|=====|=====|=====|=====|=====
2 | xx 1 |       |   1 |   2 | 0.5 |  10 |     |        
  |======|=======|=====|=====|=====|=====|=====|=====
3 | xx 2 |       |   7 |   1 |     |     |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
4 |      |       |   0 |     |     |     |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
5 | xx 3 |       |   9 |   8 |   7 |   6 |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
6 | xx 4 |       |   0 |   1 |   2 |   1 |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
7 |      |       |   1 |     |   4 |     |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
8 | xx 5 |       |     |     |     |     |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
9 |      |       |     |     |     |     |   5 |           
  |======|=======|=====|=====|=====|=====|=====|=====
∞ |      |       |     |     |     |     |     |       

what's the most optimal way of getting AVERAGE for every valid row in the dynamic sense of terms (unknown quantity of rows & unknown quantity of columns) ?



if you are here by accident for running / cumulative / rolling average see: https://stackoverflow.com/a/59120993/5632629


Solution

  • QUERY

    level 1:

    if all 5 cells in range C2:G have values:

    =QUERY(QUERY(C2:G, "select (C+D+E+F+G)/5"), "offset 1", )
    

    enter image description here

    if not, then rows are skipped:

    enter image description here

    if empty cells are considered as zeros:

    =INDEX(QUERY(QUERY({C2:G*1}, "select (Col1+Col2+Col3+Col4+Col5)/5"), "offset 1", ))
    

    enter image description here

    to remove zero values we use IFERROR(1/(1/...)) wrapping:

    =INDEX(IFERROR(1/(1/QUERY(QUERY({C2:G*1}, 
     "select (Col1+Col2+Col3+Col4+Col5)/5"), "offset 1", ))))
    

    enter image description here

    to make Col references dynamic we can do:

    =INDEX(IFERROR(1/(1/QUERY(QUERY({C2:G*1}, 
     "select "&
     "("&JOIN("+", "Col"&ROW(INDIRECT("1:"&COLUMNS(C:G))))&")/"&COLUMNS(C:G)), 
     "offset 1", ))))
    

    enter image description here


    level 2:

    if empty cells are not considered as zeros and shouldn't be skipped:

    =INDEX(TRANSPOSE(QUERY(TRANSPOSE(E2:I), 
     "select "&TEXTJOIN(",", 1, IF(A2:A="",,
     "avg(Col"&ROW(A2:A)-ROW(A2)+1&")")))),, 2)
    

    enter image description here

    note that this is column A dependant, so missing values in column A will offset the results

    fun fact !! we can swap avg to max or min:

    enter image description here

    to free it from confinement of column A and make it work for any valid row:

    =INDEX(IFERROR(1/(1/TRANSPOSE(QUERY(TRANSPOSE(
     IF(TRIM(TRANSPOSE(QUERY(TRANSPOSE(C2:G),,9^9)))="", C2:G*0, C2:G)), 
     "select "&TEXTJOIN(",", 1, 
     "avg(Col"&ROW(A2:A)-ROW(A2)+1&")"))))),, 2)
    

    enter image description here

    if present 0's in range shouldn't be averaged we can add a small IF statement:

    =INDEX(IFERROR(1/(1/TRANSPOSE(QUERY(TRANSPOSE(
     IF(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
     IF(C2:G>0, C2:G, )),,9^9)))="", C2:G*0, 
     IF(C2:G>0, C2:G, ))), 
     "select "&TEXTJOIN(",", 1, 
     "avg(Col"&ROW(A2:A)-ROW(A2)+1&")"))))),, 2)
    

    enter image description here

    here we used so-called "vertical query smash" which takes all values in a given range and concentrates it to one single column, where all cells per each row are joined with empty space as a byproduct:

    =FLATTEN(QUERY(TRANSPOSE(C2:G),,9^9))
    

    enter image description here

    apart from this, there is also "horizontal query smash":

    =QUERY(C2:G,,9^9)
    

    enter image description here

    and also "ultimate 360° double query smash" which puts all cells from range into one single cell:

    =QUERY(FLATTEN(QUERY(TRANSPOSE(C2:G),,9^9)),,9^9)
    

    enter image description here

    and finally "the infamous negative 360° reverse double query smash" which prioritizes columns over rows:

    =QUERY(FLATTEN(QUERY(C2:G,,9^9)),,9^9)
    

    enter image description here

    all query smash names are copyrighted of course

    back to the topic... as mentioned above all cells per row in range are joined with empty space even those empty ones, so we got a situation where we getting double or multiple spaces between values. to fix this we use TRIM and introduce a simple IF statement to assign 0 values for empty rows in a given range eg. to counter the offset:

    enter image description here


    MMULT

    level 3:

    MMULT is a kind of heavy class formula that is able to perform addition, subtraction, multiplication, division even running total on arrays/matrixes... however, bigger the dataset = slower the formula calculation (because in MMULT even empty rows take time to perform + - × ÷ operation) ...unless we use truly dynamic range infinite in both directions...

    to get the last row with values of a given range:

    =INDEX(MAX(IF(TRIM(FLATTEN(QUERY(TRANSPOSE(
     INDIRECT("C2:"&ROWS(A:A))),,9^9)))="",,ROW(A2:A))))
    

    enter image description here

    to get the last column with values of a given range:

    =INDEX(MAX(IF(TRIM(QUERY(INDIRECT("C2:"&ROWS(A:A)),,9^9))="",,COLUMN(C2:2))))
    

    enter image description here

    now we can construct it in a simple way:

    =INDIRECT("C2:"&ADDRESS(9, 7))
    

    which is the same as:

    =INDEX(INDIRECT("C2:"&ADDRESS(MAX(IF(TRIM(FLATTEN(QUERY(TRANSPOSE(
     INDIRECT("C2:"&ROWS(A:A))),,9^9)))="",,ROW(A2:A))), 
     MAX(IF(TRIM(QUERY(INDIRECT("C2:"&ROWS(A:A)),,9^9))="",,COLUMN(C2:2))))))
    

    enter image description here

    or shorter alternative:

    =INDEX(INDIRECT("C2:"&ADDRESS(
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)), 
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2)))))
    

    enter image description here

    therefore simplified MMULT formula would be:

    =ARRAYFORMULA(IFERROR(
     MMULT(N(   C2:G9),           ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)/
     MMULT(N(IF(C2:G9<>"", 1, )), ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)))
    

    enter image description here

    in case we want to exclude zero values from range, the formula would be:

    =ARRAYFORMULA(IFERROR(
     MMULT(N(   C2:G9),         ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)/
     MMULT(N(IF(C2:G9>0, 1, )), ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)))
    

    enter image description here

    level 4:

    putting together all above to make it infinitely dynamic and still restricted to valid dataset:

    =INDEX(IFERROR(
     MMULT(N(   INDIRECT("C2:"&ADDRESS(
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)), 
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))))),           ROW(INDIRECT("C1:C"&
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))-(COLUMN(C2)-1)))^0)/
     MMULT(N(IF(INDIRECT("C2:"&ADDRESS(
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)), 
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))))<>"", 1, )), ROW(INDIRECT("C1:C"&
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))-(COLUMN(C2)-1)))^0)))
    

    enter image description here

    again, not including cells with zeros in range:

    enter image description here


    LAMBDA

    level 5:

    since 20 September 2022, we can use new functions that make stuff easier:

    • MAKEARRAY
    • REDUCE
    • BYROW
    • BYCOL
    • SCAN
    • MAP
    • LAMBDA

    so to jump right in for a closed range we can take an average like:

    =IFERROR(BYROW(C2:G9, LAMBDA(x, AVERAGE(x))))
    

    enter image description here

    and to get an average column-wise we just replace BYROW with BYCOL. now to make the range open and truly dynamic we can modify the above formula like this:

    =IFERROR(BYROW(INDEX(INDIRECT("C2:"&ADDRESS(
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)), 
     MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))))), LAMBDA(x, AVERAGE(x))))
    

    enter image description here

    we can do it shorter by 12 characters like:

    =IFERROR(BYROW(INDEX(OFFSET(C2,,,
     MAX((INDIRECT("C2:"&ROWS(C:C))<>"")*ROW(C2:C)), 
     MAX((INDIRECT("C2:"&ROWS(C:C))<>"")*COLUMN(C2:2)))), LAMBDA(x, AVERAGE(x))))
    

    enter image description here

    to exclude zeros from the output:

    =INDEX(IFERROR(1/(1/BYROW(OFFSET(C2,,,
     MAX((INDIRECT("C2:"&ROWS(C:C))<>"")*ROW(C2:C)), 
     MAX((INDIRECT("C2:"&ROWS(C:C))<>"")*COLUMN(C2:2))), LAMBDA(x, AVERAGE(x))))))
    

    enter image description here

    to exclude zeros from input:

    =INDEX(IFERROR(1/(1/BYROW(OFFSET(C2,,,
     MAX((INDIRECT("C2:"&ROWS(C:C))<>"")*ROW(C2:C)), 
     MAX((INDIRECT("C2:"&ROWS(C:C))<>"")*COLUMN(C2:2))), LAMBDA(x, AVERAGEIF(x, ">0"))))))
    

    enter image description here

    or if blank cells should be treated as zeros:

    =INDEX(IFERROR(1/(1/BYROW(1*OFFSET(C2,,,
     MAX((INDIRECT("C2:"&ROWS(C:C))<>"")*ROW(C2:C)), 
     MAX((INDIRECT("C2:"&ROWS(C:C))<>"")*COLUMN(C2:2))), LAMBDA(x, AVERAGE(x))))))
    

    enter image description here

    also, it's worth mentioning the BYROW limitation of ~ 99990 rows


    honorable mentions:

    @Erik Tyler level:

    the polar opposite of the previous formula would be to run the MMULT on

    • total area of C2:? (all rows, all columns) instead of
    • valid area C2:? (excluding empty rows and columns) which avoids mass-calculations of 0 × 0 = 0

    including zeros:

    =INDEX(IFERROR(
     MMULT(   INDIRECT("C2:"&ROWS(C:C))*1,         SEQUENCE(COLUMNS(C2:2))^0)/ 
     MMULT(IF(INDIRECT("C2:"&ROWS(C:C))<>"", 1)*1, SEQUENCE(COLUMNS(C2:2))^0)))
    

    enter image description here

    excluding zeros:

    =INDEX(IFERROR(
     MMULT(   INDIRECT("C2:"&ROWS(C:C))*1,       SEQUENCE(COLUMNS(C2:2))^0)/ 
     MMULT(IF(INDIRECT("C2:"&ROWS(C:C))>0, 1)*1, SEQUENCE(COLUMNS(C2:2))^0)))
    

    0

    @kishkin level:

    for a fixed range C2:G9 the MMULT average would be:

    =INDEX(IFERROR(
     MMULT( C2:G9*1,    FLATTEN(COLUMN(C:G))^0)/ 
     MMULT((C2:G9>0)*1, FLATTEN(COLUMN(C:G))^0)))
    

    enter image description here

    =INDEX(IFNA(VLOOKUP(ROW(C2:C), 
     QUERY(SPLIT(FLATTEN(ROW(C2:C)&"×"&C2:J), "×"),
     "select Col1,avg(Col2)
      where Col2 is not null
      group by Col1"), 2, )))
    

    enter image description here

    @MattKing level:

    =INDEX(QUERY(SPLIT(FLATTEN(ROW(C2:C)&"×"&OFFSET(C2,,,9^9, 9^9)), "×"),
     "select avg(Col2) 
      group by Col1  
      label avg(Col2)''"))
    

    enter image description here

    excluding zeros:

    =INDEX(QUERY(SPLIT(FLATTEN(ROW(C2:C)&"×"&OFFSET(C2,,,9^9, 9^9)), "×"),
     "select avg(Col2)
      where Col2 <> 0 
      group by Col1  
      label avg(Col2)''"))
    

    including empty cells:

    =INDEX(IFERROR(1/(1/QUERY(SPLIT(FLATTEN(ROW(C2:C)&"×"&OFFSET(C2,,,9^9, 9^9)*1), "×"),
     "select avg(Col2)
      group by Col1  
      label avg(Col2)''"))))