Search code examples
pythonexcelxlwingspyperclip

Insert Formatted HTML into Excel sheet row by row


I have a rather odd use case that I'm having trouble solving. I have formatted HTML code (namely, HTML with colored text) that I want to preserve the formatting on and paste into Excel on a specified sheet. These documents can be rather long (e.g. few hundred rows), so putting them in one big cell isn't really a viable option (although I tried that as well and couldn't get it to work), and I think row by row is probably the best route.

I have been trying this, but it appears that xlwings only will respect the clipboard of xlwings; i.e. pyperclip copies it to the clipboard, but the paste fails since xlwings doesn't recognize anything being on the clipboard (said differently, if you do a .copy immediately before, it works, but it doesn't have the data I need!)

Looking for any help. This is just text, not an HTML table. I do know and recognize that this is an odd use case, but it is what it is....

#create new workbook
import xlwings as xl
import pyperclip

wb = xl.Book()

#read in html to string
with open('test.html', 'r') as file:
    data = file.read()

#copy the html to the clipboard so that the HTML format is correct?
pyperclip.copy(data)

#wb.sheets("Sheet1").range("A1").copy()    #this prevents the error, but doesn't have the data I need. Proves xlwings doesn't respect the pyperclip clipboard
wb.sheets("Sheet1").range("A1").paste() #fails since xlwings doesn't respect pyperclip clipboard

Simple html example that can be used for testing

<body>
    <p>Normal Text</p>
    <p style="color:red;">Red paragraph text</p>
    <p>Normal Text</p>
</body>

Said differently, I would like a way via Python to take the html above, which looks like this when rendered:

enter image description here

And put it into Excel, so that Excel looks like this:

enter image description here


Solution

  • Why not just read the data and write line by line to the Sheet?

    import xlwings as xl
    
    
    #read in html to string
    with open('Output.html', 'r') as file:
        #data = file.read()
        data = file.readlines()  # Html lines as separate lines
    
    with xl.App(visible=False) as app:
        wb = xl.Book()
    
        ### Write each line to the next row in the Sheet down Column A
        for row, line in enumerate(data, 1):
            wb.sheets("Sheet1").range(f'A{row}').value = line
            wb.sheets("Sheet1").range(f'A{row}').api.WrapText = False
    
        wb.save('Output.xlsx')
    

    Update
    To do what you want per the update all you want to do is open the file as html and save it.

    Assuming your html example has the name 'test.html'

    <body>
        <p>Normal Text</p>
        <p style="color:red;">Red paragraph text</p>
        <p>Normal Text</p>
    </body>
    

    E.g. the following code will open the file in Excel and save as an XLSX.
    That's all that is required to produce the Excel Sheet per your screenshot in the question.

    import xlwings as xl
    
    
    with xl.App(visible=False) as app:
        wb = xl.Book('test.html')
    
        wb.sheets.active.autofit()  # Autofit Column A if desired 
    
        wb.save('Output1.xlsx')