Search code examples
pythonlistwebpdfkit

Links from list - How to generate several pdf's using python pdfkit


I am currently trying to figure out how i can take a list of links and make python run through all of them and save them as pdf. (I'm not a python expert)

I found a python package called "pdfkit" which is quite good, but how do i set it up so that it follow my url-list and save the pdf as different names all the time?

import pdfkit

config = pdfkit.configuration(wkhtmltopdf="C:\\Program Files (x86)\\wkhtmltopdf\\bin\\wkhtmltopdf.exe")

pdfkit.from_url('http://google.com', 'MyPDF.pdf', configuration=config)

This is my current code, lets say that i have a list of 10 webpages that i want to save as 10 different pdf files how do i make a setup that would allow me to do so?

Another issue is that i need to login to the page in order to scrape the information from the links, how would you implement that?

Best Regards,


Solution

  • Answer for the first question:

    import pdfkit
    
    config = pdfkit.configuration(wkhtmltopdf="C:\\Program Files (x86)\\wkhtmltopdf\\bin\\wkhtmltopdf.exe")
    
    url_list = [
        ['http://google.com', 'google.com.pdf'],
        ['http://facebook.com', 'facebook.com.pdf'],
        ['http://yahoo.com', 'yahoo.com.pdf'],
    ]
    
    for k, v in url_list:
        pdfkit.from_url(k, v, configuration=config)
    

    For answer to the second question, you can use the requests module session feature to login first and then pass the cookie to pdfkit to download the page. See Create PDF of a https webpage which requires login using pdfkit