Search code examples
pythonpython-3.xexcelopenpyxlxlwings

How to detect and filter text/numbers with strikethrough in a cell containing both struck and unstruck text in Python (openpyxl)?


enter image description here

Hi, I am trying to filter out the struck texts/numbers in xlsx file using python using openpyxl. I am able to remove the cells containing struck texts and numbers. However, when there are two values in a cell, one with struck and the other unstruck, I am unable to do it. I tried splitting and then doing it but couldn't do it. I have been using cell.font.strikethrough to detect.

My code:

import openpyxl as op
wb=op.load_workbook("Some_Workbook.xlsx")
ws=wb['Sheet1']

for row in ws.iter_rows(max_row=1,max_col=2):
    for cell in row:
        if cell.font.strike:
            print('struck',cell.value)
        else:
            print('unstruck',cell.value)

wb.close()

I used the above code to find if the cell had values that were struck.

Please help.


Solution

  • It is now possible to do this using openpyxl. Starting from version 3.1.0 you can pass rich_text=True to the load_workbook function to enable rich text parsing.

    The following code lets you determine which part of a cell is strikethrough:

    import openpyxl
    from openpyxl.cell.rich_text import CellRichText
    
    wb = openpyxl.load_workbook("Some_Workbook.xlsx", rich_text=True)
    ws = wb["Sheet1"]
    
    for row in ws.iter_rows(max_row=1, max_col=2):
        for cell in row:
            if type(cell.value) == CellRichText:
                for text in cell.value:
                    if text.font.strike:
                        print("struck", text)
                    else:
                        print("unstruck", text)
            else:
                print("unstruck", cell.value)
    

    For your example this gives you the following output:

    struck ABCD
    unstruck  EFGH
    struck 1234
    unstruck  5678