Search code examples
pythonexcelolecheminformatics

How to extract OLE objects from Excel table using Python?


I would like to use Python to extract OLE-objects from an Excel table into the Windows clipboard.

This post didn't help further since it is for VBA. And this post is still unanswered.

Assuming the given Excel table (with ChemDraw or ChemSketch OLE objects):

enter image description here

There are some Python modules which can handle Excel files, e.g. openpyxl, xlrd. The module win32clipboard can put data into the clipboard.

My Problems:

  1. I don't see how to get the embedded OLE object to the clipboard. Probably, openpyxl and xlrd together with win32clipboard are not suited for this?
  2. There is a Python module oletools which maybe can do it but I don't understand how it works. https://pypi.org/project/oletools/

From this page:

oleobj: to extract embedded objects from OLE files.

This seems to be exactly what I am looking for, however, I couldn't find any MCVEs. And unfortunately, the documentation of oleobj is basically reduced to: "read the source code and find out yourself". I would be grateful for hints and assistance.

My code so far:

### trying to extract OLE objects from Excel table into clipboard
from openpyxl import load_workbook
import win32clipboard as clpbd

def set_clipboard(data):
    clpbd.OpenClipboard()
    clpbd.EmptyClipboard()
    clpbd.SetClipboardText(data)    # I'm aware, this is only for text, is there anything for OLEs?
    clpbd.CloseClipboard()

def print_clipboard():
    clpbd.OpenClipboard()
    data = clpbd.GetClipboardData()
    clpbd.CloseClipboard()
    print(data)

wb = load_workbook(filename = 'tbChemOLE.xlsx')
ws = wb.active

myName = ws['A3'].value    # result: napthalene
myImage = ws['B3'].value   # result: None
myObject = ws['C3'].value  # result: None

set_clipboard(myName)
print_clipboard()          # result: Naphtalene
# set_clipboard(myImage)   # crash, because myImage is None
print_clipboard()     
# set_clipboard(myObject)  # crash, because myObject is None
print_clipboard()        

wb.close()
### end of code

Solution

  • In the meantime I found this post, where the OP actually didn't want the OLE objects on the clipboard, but for me it is fine. Actually, no need for openpyxl or xlrd, but win32com.client is required.

    I can get all OLE objects, however, they are indexed (probably) in the sequence of their addition. So I need to create a dictionary with the row index as key and a set of OLE object index of and name as value.

    Code:

    ### copy OLE object in certain cell to clipboard
    import win32com.client as win32
    import win32clipboard
    
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    ffname = r'C:\Test\tbChemOLE.xlsx'
    wb = excel.Workbooks.Open(ffname)
    ws = wb.Worksheets.Item(1)
    objs = ws.OLEObjects()
    
    def get_all_OLEs():
        oleNo_dict = {}     # dictionary for all OLE objects
        for i in range(1,len(objs)+1):    # loop all OLE objects
            obj = objs.Item(i) 
            myRow = obj.TopLeftCell.Row        # row of OLE object
            myName = ws.Cells(myRow,1).Value   # corresponding name
            oleNo_dict[myRow] = (i, myName)
        return oleNo_dict
    
    def get_OLE(row):
        try: 
            objs[oleNo_dict[row][0]].Copy()
            win32clipboard.OpenClipboard()
            data = win32clipboard.GetClipboardData(0xC004) # Binary access
            win32clipboard.CloseClipboard()
        except Exception as e:
            print(e)
            win32clipboard.OpenClipboard()
            win32clipboard.EmptyClipboard()
            win32clipboard.CloseClipboard()
        return oleNo_dict[row]
        # and OLE is on clipboard if found
    
    oleNo_dict = get_all_OLEs()
    
    row = 4
    myMolecule = get_OLE(row)
    print(myMolecule[1], "OLE object is now on the clipboard.")
    
    wb.Close()
    excel.Application.Quit()
    ### end of code
    

    Result:

    Anthracene OLE object is now on the clipboard.