Search code examples
pythonpython-requestshttpx

How do I download PDF files using python's reqests/httpx module?


I'm making a program that downloads PDFs from the internet.

Here's a example of the code:

import httpx # <-- This also happens with the requests module


URL = "http://62.182.86.140/main/0/aee7239ffcf7871e1d6687ced1215e22/Markus%20Nix%20-%20Exploring%20Python-Entwickler%20%282005%29.djvu"
r = httpx.get(URL, timeout=20.0).content.decode("ascii")

with open(f"./example.pdf", "w") as f:
    f.write(str(content))

But when I write to a file, none of my pdf viewers (tried okular and zathura) can read them.

But when I download it using a program like wget, there's no problems.

Then when I compare the two files (one downloaded with python, and the other with wget), everything is encoded, and I can't figure out how to decode it (.decode() doesn't work).


Solution

  • import httpx
    
    
    def main(url):
        r = httpx.get(url, timeout=20)
        with open('file.djvu', 'wb') as f:
            f.write(r.content)
    
    
    main('http://62.182.86.140/main/0/aee7239ffcf7871e1d6687ced1215e22/Markus%20Nix%20-%20Exploring%20Python-Entwickler%20%282005%29.djvu')