Search code examples
pythonpython-3.xpandascsvurllib

How to skip a dead link and move onto the next?


I'm trying to get a program to work where I can input a list of image URLs and download all of them automatically to a folder. The problem arises when there's a dead link somewhere in the batch. Obviously, I don't want to go in and manually remove 1000+ dead links, so I just want to "skip" over them.

Here is what I have so far:

import pandas as pd
import urllib.request
import time

def url_to_jpg(i, url, file_path):
    filename = 'image-{}.jpg'.format(i)
    full_path = '{}{}'.format(file_path, filename)
    urllib.request.urlretrieve(url, full_path)
    print('{} saved.'.format(filename))
    return None


FILENAME = 'images.csv'
FILE_PATH = 'images/'


urls = pd.read_csv(FILENAME)

while True:
    try:
        for i, url in enumerate(urls.values):
            url_to_jpg(i, url[0], FILE_PATH);
    except urllib.error.HTTPError:
        continue
        break

I am just a beginner, and that last part with checking for exceptions is the farthest I got.

Sorry for the messy code, I am just in a rush and have no time.


Solution

  • If you can spare the time, replace this code:

    while True:
        try:
            for i, url in enumerate(urls.values):
                url_to_jpg(i, url[0], FILE_PATH);
        except urllib.error.HTTPError:
            continue
            break
    

    with:

    for i, url in enumerate(urls.values):
        try:
            url_to_jpg(i, url[0], FILE_PATH);
        except urllib.error.HTTPError:
            continue
    

    Note that following a continue statement with a break statement at the same indentation level makes no sense, since the continue causes the program flow to jump back to the top of the loop. Your while True: loop doesn't actually do anything except to prevent your program from exiting.