Search code examples
excelexcel-formulaapache-poiworksheet-function

Apache POI and SUBTOTAL formula


=SUMPRODUCT((K:K="yes")*(SUBTOTAL(103,OFFSET(K10,ROW(K:K)-MIN(ROW(K10:K10)),0))))

and

=SUMPRODUCT((K:K="yes")*(SUBTOTAL(103,OFFSET(K10,ROW(K:K)-MIN(ROW(K10:K10)),0))))+(NOW()*0)

This is the excel formula that is used in one of the sheets in the generated xls workbook using java poi api. It evaluates correctly only if I press Enter on the cell in Excel. Formula evaluator and wb.setForceFormulaRecalculation(true) doesn't seem to work.

The java code is :

cell.setCellFormula("SUMPRODUCT((K:K=\"yes\")*(SUBTOTAL(103,OFFSET(K10,ROW(K:K)-MIN(ROW(K10:K10)),0))))");

The goal of the formula is to count the occurence of "yes" in column K but only for visible rows after filtering. K10 is the cell from where the actual data starts. Rows above this cell contain headers.


Solution

  • I can confirm this to be a problem using HSSF. But it has nothing to do with formula recalculation. OFFSET itself is volatile as well as NOW. Both force formula recalculation even without setting setForceFormulaRecalculation(true). But in the binary *.xls file the ROW(K:K) in OFFSET does not initially evaluating as array. So it is only once evaluated to 1 (ROW(K1)) instead to the array {1,2,3,4,5,...} (ROW(K1), ROW(K2), ROW(K3), ...). Using XSSF (*.xlsx) it works.

    I found the problem is how apache poi creates the HSSF formula for SUMPRODUCT. SUMPRODUCT always is an array function. So all functions embedded in SUMPRODUCT also should be array functions. But apache poi does not set the CLASS_ARRAY to the functions which are embedded in SUMPRODUCT. Instead it sets CLASS_VALUE as if the functions were stand alone.

    The following working draft shows this problem. There is a method makeArrayFormula which changes all FuncVarPtg (function variable parse things) which were CLASS_VALUE to CLASS_ARRAY. After running that method, the SUMPRODUCT formula works as expected also in HSSF.

    import java.io.FileOutputStream;
    import org.apache.poi.ss.usermodel.*;
    import org.apache.poi.hssf.usermodel.HSSFWorkbook;
    import org.apache.poi.hssf.usermodel.HSSFCell;
    import org.apache.poi.xssf.usermodel.XSSFWorkbook;
    
    import org.apache.poi.hssf.record.CellValueRecordInterface;
    import org.apache.poi.hssf.record.FormulaRecord;
    import org.apache.poi.hssf.record.aggregates.FormulaRecordAggregate;
    import org.apache.poi.ss.formula.ptg.Ptg;
    import org.apache.poi.ss.formula.ptg.FuncVarPtg;
    
    import java.lang.reflect.Field;
    
    class CreateExcelFormula {
    
     static void makeArrayFormula(HSSFCell formulaCell) throws Exception {
      Field _record = HSSFCell.class.getDeclaredField("_record");
      _record.setAccessible(true); 
      CellValueRecordInterface recordInterface = (CellValueRecordInterface)_record.get(formulaCell);
      System.out.println(recordInterface);
      if (recordInterface instanceof FormulaRecordAggregate) {
       FormulaRecordAggregate formulaRecordAggregate = (FormulaRecordAggregate)recordInterface;
       FormulaRecord formulaRecord = formulaRecordAggregate.getFormulaRecord();
       Ptg[] ptgs = formulaRecord.getParsedExpression();
       for (Ptg ptg : ptgs) {
        if (ptg instanceof FuncVarPtg) {
         if (ptg.getPtgClass() == Ptg.CLASS_VALUE) {
          ptg.setClass(Ptg.CLASS_ARRAY);
         }
        }
       }
       formulaRecord.setParsedExpression(ptgs);
      }
      System.out.println(recordInterface);
     }
    
     public static void main(String[] args) throws Exception {
    
      try (
           Workbook workbook = new HSSFWorkbook(); FileOutputStream fileout = new FileOutputStream("Excel.xls") ) {
           //Workbook workbook = new XSSFWorkbook(); FileOutputStream fileout = new FileOutputStream("Excel.xlsx") ) {
    
       Sheet sheet = workbook.createSheet();
       Row row;
       Cell cell;
    
       row = sheet.createRow(0);
    
       cell = row.createCell(0);
       cell.setCellValue("F:");
       cell = row.createCell(1);
       //cell.setCellFormula("SUMPRODUCT((K:K=\"yes\")*(SUBTOTAL(103,OFFSET(K10,ROW(K:K)-MIN(ROW(K10:K10)),0))))");
       cell.setCellFormula("SUMPRODUCT((K10:K10000=\"yes\")*(SUBTOTAL(103,OFFSET(K10,ROW(K10:K10000)-ROW(K10),0))))");
       if (cell instanceof HSSFCell) {
        makeArrayFormula((HSSFCell)cell);
       }
    
       for (int r = 9; r < 30; r++) {
        row = sheet.createRow(r);
        cell = row.createCell(10);
        if (r % 2 == 0) cell.setCellValue("yes"); else cell.setCellValue("no");
       }
    
       for (int r = 14; r < 19; r++) {
        sheet.getRow(r).setZeroHeight(true);
       }  
    
       workbook.write(fileout);
      }
    
     }
    }
    

    Btw.: One should never using full column references like K:K in array formulas. This is a performance night mare. And the MIN around ROW(K10:K10) is superfluous.