I have a list of URLs I want to run through, clean using BeautifulSoup and save to a .txt file.
This is my code right now with just a couple of items in the list, there will be many more coming in from a txt file but for now this keeps it simple.
While the loop is working it is passing the output for both URLs to the URL.txt file. I would like each instance in the list to output to its unique .txt file.
import urllib
from bs4 import BeautifulSoup
x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
"https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]
for url in x:
#I want to open the URL listed in my list
fp = urllib.request.urlopen(url)
test = fp.read()
soup = BeautifulSoup(test,"lxml")
output=soup.get_text()
#and then save the get_text() results to a unique file.
file=open("url.txt","w",encoding='utf-8')
file.write(output)
file.close()
Thank you for taking a look. Best, George
Create different filename for each item in the list like below:
import urllib
from bs4 import BeautifulSoup
x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
"https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]
for index , url in enumerate(x):
#I want to open the URL listed in my list
fp = urllib.request.urlopen(url)
test = fp.read()
soup = BeautifulSoup(test,"lxml")
output=soup.get_text()
#and then save the get_text() results to a unique file.
file=open("url%s.txt" % index,"w",encoding='utf-8')
file.write(output)
file.close()