Search code examples
excelworksheet-functionarray-formulas

Optimizing multiple-criteria IFs


I'm performing array calculations that are taking a long time to complete. I'd like to optimize my formulas some more. All of the formulas are of the same nature - they perform some high-level function (Average, Slope, Min, Max) across a column of values. However, not all cells in a column are included in the array. I use multiple IF criteria to choose which cells get included. All comparisons are made to the current row. Here's an example of the data:

     A             B                C            D          E
1    Company       Generation       Date         Value      ToCalculate
2    Abc           1                1/1/2010     5.6          
3    ...           ...              ...          ...        ...

E would look something like this

{=Average(If(A2=A2:A1000, If(B2=B2:B1000, If(C2 > C2:C1000, D2:D1000))))}

So once E2 is calculated then I have to autofill down column E. Column F, G, H, ... Uses the same approach, either selects different values to operate on or a different function to perform. My dataset is quite large, and with only a few of these the spreadsheet is taking an hour plus to compute. Every so often I'll add a fourth criteria, all other criteria being the same.

Is there an efficiency? Some thoughts:

  1. Can I use a single array per column instead of thousands per column?
  2. Can I condense the first three criteria so that the output is row numbers? Perhaps then subsequent formulas won't have to search for multiple criteria but can just perform the function?
  3. or somehow build the crtieria up? So a new column returns all rows where the company is the same. another column returns all rows from the first column where generation is the same...and so on...

Solution

  • For the Average you can do without arrays:

     =AVERAGEIFS(D2:D$1000,A2:A$1000,A2,B2:B$1000,B2,C2:C$1000,"<="&C2)  
    

    As there is also a COUNTIFS and a SUMIFS, I think your slopes could be calculated the same way.

    For the rest of the functions (max, min, etc), we should analyze case by case.

    I did a slight performance test, and this is apparently better, but of course my datasets are just mocked.

    HTH!

    Note: Excel 2007 and up only!

    Edit - Answering your comment.

    Without knowing the dimensions of the problem is difficult to give advice, but I'll risk one anyway:

    You could write a VBA function that:

    1) Generates a new sheet for each company-generation pair
    2) Sorts the data in those sheets by date
    3) Adds the formulas to those sheets (no conditionals needed in this context)
    4) Recalculates and Gets the results from those formulas and populates the original sheet
    5) Deletes the auxiliary sheets