python python-3.x excel openpyxl xlwings

How to detect and filter text/numbers with strikethrough in a cell containing both struck and unstruck text in Python (openpyxl)?

Hi, I am trying to filter out the struck texts/numbers in xlsx file using python using openpyxl. I am able to remove the cells containing struck texts and numbers. However, when there are two values in a cell, one with struck and the other unstruck, I am unable to do it. I tried splitting and then doing it but couldn't do it. I have been using cell.font.strikethrough to detect.

My code:

import openpyxl as op
wb=op.load_workbook("Some_Workbook.xlsx")
ws=wb['Sheet1']

for row in ws.iter_rows(max_row=1,max_col=2):
    for cell in row:
        if cell.font.strike:
            print('struck',cell.value)
        else:
            print('unstruck',cell.value)

wb.close()

I used the above code to find if the cell had values that were struck.

Please help.

Solution

It is now possible to do this using openpyxl. Starting from version 3.1.0 you can pass rich_text=True to the load_workbook function to enable rich text parsing.

The following code lets you determine which part of a cell is strikethrough:

import openpyxl
from openpyxl.cell.rich_text import CellRichText

wb = openpyxl.load_workbook("Some_Workbook.xlsx", rich_text=True)
ws = wb["Sheet1"]

for row in ws.iter_rows(max_row=1, max_col=2):
    for cell in row:
        if type(cell.value) == CellRichText:
            for text in cell.value:
                if text.font.strike:
                    print("struck", text)
                else:
                    print("unstruck", text)
        else:
            print("unstruck", cell.value)

For your example this gives you the following output:

struck ABCD
unstruck  EFGH
struck 1234
unstruck  5678