Search code examples
pythonexcelwin32com

Get text from Excel's TextFrame/Textbox in Python


I am trying to scrape the data from an excel sheet where some word is inside a text frame / textbox.
I tried using different library to get it but unfortunately, I can't make it to work.
Does anyone know any way for me to achieve my goals?

I tried with canvas etc but it doesn't work, it showed;

pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, None,
'This member can only be accessed for a single shape.', None, 0, -2147024809), None)

I followed this Answer

My code:

import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = False
wb = excel.Workbooks.Open("C:\\pythonExcelScrape\\uct-test22.xlsx")
sheet = wb.Worksheets(1)
canvas = sheet.Shapes
for shp in canvas:
    print (shp.TextFrame2.TextRange)

Solution

  • This example will extract the text using win32com.client

    There may be two shape types to check for as it appears [some of] your boxes are in a group. The example below covers both.

    For the example I have created a worksheet with two lots of textboxes. See the example screenshot below

    • The first two textboxes on row 1 are individual [single] textboxes.
    • The second two textboxes on row 3 are textboxes in a group.

    Code execution.
    You can look for the specific shape type first to avoid errors due to an invalid attribute being used on wrong shape type. For example if the shape is not a textbox then the attribute TextFrame2 does not apply as will be your case. Therefore you only want to call shape.TextFrame2 when the shape is in fact a textbox. So;

    1. For the first two textbox the code is looking for shape type 17, (17 is the type number of a textbox). Then you can simply extract the text from the shape.
    2. For the second two textbox the code finds a group shape first, then loops through the group to extract each textbox as in step 1. This time the shape type to look for is 6, (6 is the type number of a group).

    See example output from the example sheet below.

    import win32com.client as win32
    
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    excel.Visible = False
    excel_file = "C:\\pythonExcelScrape\\uct-test22.xlsx"
    wb = excel.Workbooks.Open(excel_file)
    ws = wb.Sheets('Sheet1')  # Set appropriate sheet name 
    
    for shp in ws.Shapes:
        print(f"Shape Type: '{shp.Type}'")
        print(f"Shape Name: '{shp.Name}'")
        if shp.Type == 17:  # Type 17 is a textbox
            print(f"Text Box '{shp.Name}' text: '{shp.TextFrame2.TextRange.Text}'")
            print('-------------------------\n')
        elif shp.Type == 6:  # 6 is a group
            print('\n=========================')
            for grp in shp.GroupItems:
                print(f"Group Shape Type: '{grp.Type}'")
                print(f"Group Shape Name: '{grp.Name}'")
                if grp.Type == 17:
                    print(f"Grouped Text Box '{grp.Name}' text: '{grp.TextFrame2.TextRange.Text}'")
                    print('-------------------------\n')
    
    

    Example Sheet
    enter image description here

    Output

    Shape Type: '17'
    Shape Name: 'TextBox 1'
    Text Box 'TextBox 1' text: 'Part Number'
    -------------------------
    
    Shape Type: '17'
    Shape Name: 'TextBox 2'
    Text Box 'TextBox 2' text: '839-122438-00'
    -------------------------
    
    Shape Type: '6'
    Shape Name: 'Group 5'
    
    =========================
    Group Shape Type: '17'
    Group Shape Name: 'TextBox 3'
    Grouped Text Box 'TextBox 3' text: 'Order Number'
    -------------------------
    
    Group Shape Type: '17'
    Group Shape Name: 'TextBox 4'
    Grouped Text Box 'TextBox 4' text: 'W7004827541'
    -------------------------