Search code examples
pythonseleniumget

Python download href, got the source code instead of a pdf file


I'm trying to download a pdf file with the following href (i change some value cause the pdf contain personal information)

https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/?tx_defacturation%5BdoId%5D=857AD9348B0007984D4B128F1E8BE&cHash=7b3a9f6d109dde87bd1d95b80ca1d

When i past this href in my browser the pdf file is directly download, but when i'm trying to use request in my python code its only download the source code of

https://clients.direct-energie.com/grandcompte/factures/consulter-votre-facture/

Here is my code, i use selenium to find the href in the website

fact = driver.find_element_by_xpath(url)
href = fact.get_attribute('href')
print(href)      // href is correct here
reply = get(href, Stream=True)
print(reply)     // I got the source code

Here is the html find by selenium

<a href="grandcompte/factures/consulter-votre-factue/?tx_defacturation%5BdoId%5D=857AD9348B0007984D4B128F1E8BE&cHash=7b3a9f6d109dde87bd1d95b80ca1d"></a>

I hope you have enough informations to help, Thx


Solution

  • Can't use your link because it required auth so found another example of a redirecting pdf download. Setting Chrome to download the pdf instead of displaying it taken from this StackOverflow answer.

    import selenium.webdriver
    
    url = "https://readthedocs.org/projects/selenium-python/downloads/pdf/latest/"
    
    download_dir = 'C:/Dev'
    profile = {
        "plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}],
        "download.default_directory": download_dir ,
        "download.extensions_to_open": "applications/pdf"
    }
    
    options = selenium.webdriver.ChromeOptions()
    options.add_experimental_option("prefs", profile)
    driver = selenium.webdriver.Chrome(options=options)
    
    driver.get(url)
    

    From looking at the docs, the driver.get method doesn't return anything, it's just telling the webdriver to navigate to a page. If you want to handle the pdf in Python before saving it to a file then perhaps look at using Requests or Robobrowser.

    Stream=True option wasn't available for webdriver.Chrome so not sure if this is the method you were using but the above should do what you want.