Search code examples
pythonscrapyxlsx

scrapy-xlsx, How to make a clickable link in a xlsx?


I don't know if my code is responsible for this issue or not. With this question (Scapy, Python - To create a clickable link with CSV), I found out that CSV does not support hyperlink format. So I found the scrapy-xlsx package, installed it, and ran it.

scrapy crawl GoogleScrapyBot -o output.xlsx

Then I can get "output.xlsx". When I open "output.xlsx" on my PC (Windows 10, Office365), it opens as shown below (the hyperlink is still not set) enter image description here

But when I click on the input window in Excel, enter image description here

It will be changed to a clickable link. enter image description here

I don't want to do this on the Excel program. Is there any possible way in Scrapy?


Solution

  • You can add this method in your GoogleBotsSpider class. It uses openpyxl which is included with scrapy-xlsx. Dont forget to add from openpyxl import load_workbook.

    def close(self, reason):
        super().close(self, reason)
        wb = load_workbook(self.settings.attributes["FEED_URI"].value)
        ws = wb.active
        for i, row in enumerate(ws.rows):
            if i:  # ignores first row which contains column headers
                row[0].hyperlink = row[0].value  # here 0 means the first cell of the row, adapt it to your code
                row[0].style = "Hyperlink"
        wb.save(self.settings.attributes["FEED_URI"].value)
    

    You also need to add this setting FEED_URI = "output.xlsx" in settings.py but you could also get it from sys.argv.