Search code examples
pythoncsvdirectoryhyperlinkdownload

Multiple download - CSV file


I have a script, below, that can download files from a particular row from 1 only CSV file. I have no problem with it, it works well and all files are downloaded into my 'Python Project' folder, root.

But I would like to add functions here, First, download not only 1 but multiple (20 or more) CSV files then I don't have to change the name manually here - open('name1.csv') everytime my script has done the job. Second request, downloads need to be placed in a folder with the same name of the csv file that downloads come from. Hopefully I'm clear enough :)

Then I could have:

  • name1.csv -> name1 folder -> download from name1 csv
  • name2.csv -> name2 folder -> download from name2 csv
  • name3.csv -> name3 folder -> download from name3 csv
  • ...

Any help or suggestions will be more than appreciate :) Many thanks!

from collections import Counter
import urllib.request
import csv
import os

with open('name1.csv') as csvfile:  #need to add multiple .csv files here.
    reader = csv.DictReader(csvfile)
    title_counts = Counter()
    
    for row in reader:
        name, ext = os.path.splitext(row['link'])
        title = row['title']
        title_counts[title] += 1
        title_filename = f"{title}_{title_counts[title]}{ext}".replace('/', '-') #need to create a folder for each CSV file with the download inside.
        urllib.request.urlretrieve(row['link'], title_filename)

Solution

  • You need to add an outer loop which will iterate over files in specific folder. You can use either os.listdir() which returns list of all entries or glob.iglob() with *.csv pattern to get only files with .csv extension.

    Also there are some minor improvements you can make in your code. You're using Counter in the way that it can be replaced with defaultdict or even simple dict. Also urllib.request.urlretrieve() is a part of legacy interface which might get deprecated, so you can replace it with combination of urllib.request.urlopen() and shutil.copyfileobj().

    Finally, to create a folder you can use os.mkdir() but previously you need to check whether folder already exists using os.path.isdir(), it's required to prevent FileExistsError exception.

    Full code:

    from os import mkdir
    from os.path import join, splitext, isdir
    from glob import iglob
    from csv import DictReader
    from collections import defaultdict
    from urllib.request import urlopen
    from shutil import copyfileobj
    
    csv_folder = r"/some/path"
    glob_pattern = "*.csv"
    for file in iglob(join(csv_folder, glob_pattern)):
        with open(file) as csv_file:
            reader = DictReader(csv_file)
            save_folder, _ = splitext(file)
            if not isdir(save_folder):
                mkdir(save_folder)
            title_counter = defaultdict(int)
            for row in reader:
                url = row["link"]
                title = row["title"]
                title_counter[title] += 1
                _, ext = splitext(url)
                save_filename = join(save_folder, f"{title}_{title_counter[title]}{ext}")
                with urlopen(url) as req, open(save_filename, "wb") as save_file:
                    copyfileobj(req, save_file)