I am trying to scrape the data from an excel sheet where some word is inside a text frame / textbox.
I tried using different library to get it but unfortunately, I can't make it to work.
Does anyone know any way for me to achieve my goals?
I tried with canvas etc but it doesn't work, it showed;
pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, None,
'This member can only be accessed for a single shape.', None, 0, -2147024809), None)
I followed this Answer
My code:
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = False
wb = excel.Workbooks.Open("C:\\pythonExcelScrape\\uct-test22.xlsx")
sheet = wb.Worksheets(1)
canvas = sheet.Shapes
for shp in canvas:
print (shp.TextFrame2.TextRange)
This example will extract the text using win32com.client
There may be two shape types to check for as it appears [some of] your boxes are in a group. The example below covers both.
For the example I have created a worksheet with two lots of textboxes. See the example screenshot below
Code execution.
You can look for the specific shape type first to avoid errors due to an invalid attribute being used on wrong shape type. For example if the shape is not a textbox then the attribute TextFrame2
does not apply as will be your case. Therefore you only want to call shape.TextFrame2
when the shape is in fact a textbox. So;
See example output from the example sheet below.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = False
excel_file = "C:\\pythonExcelScrape\\uct-test22.xlsx"
wb = excel.Workbooks.Open(excel_file)
ws = wb.Sheets('Sheet1') # Set appropriate sheet name
for shp in ws.Shapes:
print(f"Shape Type: '{shp.Type}'")
print(f"Shape Name: '{shp.Name}'")
if shp.Type == 17: # Type 17 is a textbox
print(f"Text Box '{shp.Name}' text: '{shp.TextFrame2.TextRange.Text}'")
print('-------------------------\n')
elif shp.Type == 6: # 6 is a group
print('\n=========================')
for grp in shp.GroupItems:
print(f"Group Shape Type: '{grp.Type}'")
print(f"Group Shape Name: '{grp.Name}'")
if grp.Type == 17:
print(f"Grouped Text Box '{grp.Name}' text: '{grp.TextFrame2.TextRange.Text}'")
print('-------------------------\n')
Output
Shape Type: '17'
Shape Name: 'TextBox 1'
Text Box 'TextBox 1' text: 'Part Number'
-------------------------
Shape Type: '17'
Shape Name: 'TextBox 2'
Text Box 'TextBox 2' text: '839-122438-00'
-------------------------
Shape Type: '6'
Shape Name: 'Group 5'
=========================
Group Shape Type: '17'
Group Shape Name: 'TextBox 3'
Grouped Text Box 'TextBox 3' text: 'Order Number'
-------------------------
Group Shape Type: '17'
Group Shape Name: 'TextBox 4'
Grouped Text Box 'TextBox 4' text: 'W7004827541'
-------------------------