Search code examples
formulaopenpyxl

Excel file needing a recovery when I create a file that uses UNIQUE function with openpyxl


I am trying to get the UNIQUE entries on a column and later count how many times they have occurred.

A B
Goose 23000
Duck 6000
Penguin 46
Ostrich 120
def add_statistics(ws, data_start_row, data_end_row):
    first = ws.max_row + 1
    ws.append([f"=SORT(UNIQUE(G{data_start_row}:G{data_end_row}))", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first})"])
    ws.append(["", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first+1})"])
    ws.append(["", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first+2})"])
    ws.append(["", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first+3})"])
    ws.append(["", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first+4})"])

When I try to open the resulting excel file, it needs recovery and deletion of the f"=SORT(UNIQUE(G{data_start_row}:G{data_end_row}))" formula. I don't have the problem if I just manually append the UNIQUE entries or use the formula later on when I open the file and write it manually.

Is the spilling not supported by openpyxl or is there a step I need to take before I can safely use a spill formula?


Solution

  • Openpyxl does not support neither "SORT" nor "UNIQUE". According to the documentation:

    If you’re trying to use a formula that isn’t known this could be because you’re using a formula that was not included in the initial specification. Such formulae must be prefixed with _xlfn. to work.

    I checked this by importing FORMULAE and checking if they are in that set:

    from openpyxl.utils import FORMULAE
    "UNIQUE" in FORMULAE
    False
    "SORT" in FORMULAE
    False
    

    Adding _xlfn solves the error but adds @ at the start of the formula preventing it to leak into the other cells.

    f"=_xlfn.SORT(_xlfn.UNIQUE(G{data_start_row}:G{data_end_row}))"
    

    Using ArrayFormula, we can define a range of cells that we want to apply the formula. So in the end my code looked like this:

    def add_statistics(ws, data_start_row, data_end_row):
        first = ws.max_row + 1
        ws.append(["", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first})"])
        ws.append(["", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first+1})"])
        ws.append(["", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first+2})"])
        ws.append(["", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first+3})"])
        ws.append(["", f"=COUNTIF(G{data_start_row}:G{data_end_row}, A{first+4})"])
        ws[f"A{first}"] = ArrayFormula(f"A{first}:A{first+4}", f"=_xlfn.SORT(_xlfn.UNIQUE(g{data_start_row}:g{data_end_row}))")
    

    https://openpyxl.readthedocs.io/en/latest/simple_formulae.html#id1