Search code examples
pythonpython-2.7python-3.xpython-requestsurllib2

Download PDF's links listed in csv with python Request module


Download 1000's PDF's links listed in csv with python Request module.


Solution

  • I would suggest you to use Requests, then you can do something along the lines of:

    import os
    import csv
    import requests
    
    write_path = 'folder_name'  # ASSUMING THAT FOLDER EXISTS!
    
    with open('x.csv', 'r') as csvfile:
        spamreader = csv.reader(csvfile)
        for link in spamreader:
            print('-'*72)
            pdf_file = link[0].split('/')[-1]
            with open(os.path.join(write_path, pdf_file), 'wb') as pdf:
                try:
                    # Try to request PDF from URL
                    print('TRYING {}...'.format(link[0]))
                    a = requests.get(link[0], stream=True)
                    for block in a.iter_content(512):
                        if not block:
                            break
    
                        pdf.write(block)
                    print('OK.')
                except requests.exceptions.RequestException as e:  # This will catch ONLY Requests exceptions
                    print('REQUESTS ERROR:')
                    print(e)  # This should tell you more details about the error
    

    Testing content of x.csv is:

    https://www.pabanker.com/media/3228/qtr1pabanker_final-web.pdf
    http://www.pdf995.com/samples/pdf.pdf
    https://tcd.blackboard.com/webapps/dur-browserCheck-BBLEARN/samples/sample.pdf
    http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf
    

    Sample output:

    $ python test.py
    ------------------------------------------------------------------------
    TRYING https://www.pabanker.com/media/3228/qtr1pabanker_final-web.pdf...
    REQUESTS ERROR:
    ("Connection broken: ConnectionResetError(54, 'Connection reset by peer')", ConnectionResetError(54, 'Connection reset by peer'))
    ------------------------------------------------------------------------
    TRYING http://www.pdf995.com/samples/pdf.pdf...
    OK.
    ------------------------------------------------------------------------
    TRYING https://tcd.blackboard.com/webapps/dur-browserCheck-BBLEARN/samples/sample.pdf...
    OK.
    ------------------------------------------------------------------------
    TRYING http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf...
    OK.