Search code examples
pythonpython-3.xlisttext-files

Script to convert multiple URLs or files to individual PDFs and save to a specific location


I have written a script where I am taking the input of URLs hardcoded and giving their filenames also hardcoded, whereas I want to take the URLs from a saved text file and save their names automatically in a chronological order to a specific folder.

My code (works) :

import requests

#input urls and filenames
urls = ['https://www.northwestknowledge.net/metdata/data/pr_1979.nc',
'https://www.northwestknowledge.net/metdata/data/pr_1980.nc',
'https://www.northwestknowledge.net/metdata/data/pr_1981.nc']

fns = [r'C:\Users\HBI8\Downloads\pr_1979.nc',
r'C:\Users\HBI8\Downloads\pr_1980.nc',
r'C:\Users\HBI8\Downloads\pr_1981.nc']

#defining the inputs
inputs= zip(urls,fns)

#define download function
def download_url(args):
    
    url, fn = args[0], args[1]
    try:
        r = requests.get(url)
        with open(fn, 'wb') as f:
            f.write(r.content)
    except Exception as e:
        print('Failed:', e)

#loop through all inputs and run download function
for i in inputs :
    result = download_url(i)

Trying to fetch the links from text (error in code):

import requests

# getting all URLS from textfile
file = open('C:\\Users\\HBI8\\Downloads\\testing.txt','r')
#for each_url in enumerate(f):
list_of_urls = [(line.strip()).split() for line in file]
file.close()

#input urls and filenames
urls = list_of_urls

fns = [r'C:\Users\HBI8\Downloads\pr_1979.nc',
r'C:\Users\HBI8\Downloads\pr_1980.nc',
r'C:\Users\HBI8\Downloads\pr_1981.nc']

#defining the inputs
inputs= zip(urls,fns)

#define download function
def download_url(args):
    
    url, fn = args[0], args[1]
    try:
        r = requests.get(url)
        with open(fn, 'wb') as f:
            f.write(r.content)
    except Exception as e:
        print('Failed:', e)

#loop through all inputs and run download fupdftion
for i in inputs :
    result = download_url(i)

testing.txt has those 3 links pasted in it on each line.

Error :

Failed: No connection adapters were found for "['https://www.northwestknowledge.net/metdata/data/pr_1979.nc']"
Failed: No connection adapters were found for "['https://www.northwestknowledge.net/metdata/data/pr_1980.nc']"
Failed: No connection adapters were found for "['https://www.northwestknowledge.net/metdata/data/pr_1981.nc']"

PS : I am new to python and it would be helpful if someone could advice me on how to loop or go through files from a text file and save them indivually in a chronological order as opposed to hardcoding the names(as I have done).


Solution

  • I have inserted a delimiter as ',' by using split function. In order to give automated file name I used the index number of the stored list.

    Data saved in following manner in txt file. FileName | Object ID | Base URL

    url_file = open('C:\\Users\\HBI8\\Downloads\\testing.txt','r')
    fns=[]
    list_of_urls = []
    for line in url_file:
      stripped_line = line.split(',')
      print(stripped_line)
      list_of_urls.append(stripped_line[2]+stripped_line[1])
      fns.append(stripped_line[0])
    url_file.close()