Search code examples
pythonhtmlweb-scrapingexport-to-excelxlsxwriter

pasting to excel with python


When I manually paste the text of a website with tables into excel, the tables retain placement and cell shading. Trying the same with excel packages like xlsxwriter only allows me to paste the entirety of the text in one cell. Is there a way around this?


Solution

  • The clipboard and pasting are actually rather more complex than you might think as a user - and their behaviour can be very application-specific, at both the copy-from and paste-to ends. This is A Good Thing for users because it means you can get nice transformations like the one you are seeing from a html table into Excel.

    When something is copied onto clipboard the copied-from app can (if it wants to) provide several different formats (e.g. raw text, rtf-formatted text, etc.). Apparently when copying from Microsoft Word it puits whatever you copied onto the clipboard in 13 different formats - see https://code.google.com/p/clipboardviewer/.

    When pasting the receiving app can choose from the formats available, and of course can further process whichever format it chooses. So between your browser and excel, maybe at the Excel end, something is recognizing that the source is a table and treating the html nicely. You aren't using copy/paste, so I'm afraid you will have to implement the processing of the html yourself - put the raw text into the target cell (and cells below/right) is the easy bit, you will also have to write code to extract and apply text formatting, colour of cell, text alignment, etc. I don't know if it's possible, but if you can get it to work it might be neat to use python to automate the gui copy/paste operations, so copy/paste work as if you were pressing the keys manually.