Search code examples
pythonpython-3.xopenpyxlexcel-365

How can I correctly write a Python filter function in Excel 365?


While working on a project, I wanted to use the filter function in excel in order to display things in a neat and professional fashion. Unfortunately, openpyxl (3.1.4) doesn't support the filter function, so one has to add it to the python code like this:

example = "=_xlfn.FILTER(Index!$D$4:$D$100,Index!$F$4:$F$100=\"JA\",\"\")"
ws['C4']= example

When the program runs, it enters the data into the workbook...with an @ at the beginning of the formula like so:

=@FILTER(Index!$D$4:$D$100;Index!$F$4:$F$100="JA";"")

This symbol causes the filter function to only display the first instance of an item instead of displaying the entire list. Of course when one deletes the symbol everything works normally.

Of course I can manually delete it every time, but that would get annoying after a while since this program will be used quite often. How can I utilize this powerful feature in Python? Or do I need to just grin and bear it?


Solution

  • Short answer

    I am afraid you cannot use Dynamic array formulas and spilled array behavior with OpenPyXL up to now. The spilled array behavior was introduced in Excel version 365. It is not usable in former versions. OpenPyXL was introduced and bases on specifications when Excel Version was 2007 (the first version using Office Open XML and *.xlsx file format).

    But why the @ symbol?

    This get described in Implicit intersection operator: @:

    The @ symbol is ... used ... to indicate implicit intersection. ... the @ indicates that the formula should use implicit intersection to retrieve the value

    If OpenPyXL stores the FILTER formula, then it does this like Excel versions before 365 would store formulas. It not considers spilled array behavior, which would need to store special cell metadata to describe the spilled array behavior - spilling values over as much cells as needed. There are not even classes for those metadata in OpenPyXL up to now. Thus, when Excel opens the file, it finds a FILTER formula without spilled array behavior, which cannot work correctly and only returns one value. It marks that with @.

    Can you remove the @?

    Not really, I am afraid. As said you cannot use dynamic array formulas and spilled array behavior using OpenPyXL up to now. One could use FILTER in old array formula context. But that needs to know the size of the resulting array as old array formulas not are dynamic and explicitly not spilling their values over as much cells as needed.

    Example:

    from openpyxl import Workbook
    from openpyxl.worksheet.formula import ArrayFormula  
    wb = Workbook()
    
    ws = wb.create_sheet("Index", 1)
    ws['D3'] = ""
    data = [{'D':"D4"}, {'D':"D5", 'F':"JA"}, {'D':"D6"}, {'D':"D7", 'F':"JA"}, {'D':"D8"}]
    for row in data:
        ws.append(row)
        
    ws = wb.active;
    formula = "=_xlfn.FILTER(Index!$D$4:$D$100,Index!$F$4:$F$100=\"JA\",\"\")"
    ws['C4'] = ArrayFormula('C4:C5', formula)
    
    wb.save('excelResult.xlsx')