Search code examples
pythonfor-loopwritefile

Loop through list of URLs, run BeautifulSoup, write to file


I have a list of URLs I want to run through, clean using BeautifulSoup and save to a .txt file.

This is my code right now with just a couple of items in the list, there will be many more coming in from a txt file but for now this keeps it simple.

While the loop is working it is passing the output for both URLs to the URL.txt file. I would like each instance in the list to output to its unique .txt file.

import urllib
from bs4 import BeautifulSoup


x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
"https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]

for url in x:

    #I want to open the URL listed in my list

    fp = urllib.request.urlopen(url)
    test = fp.read()
    soup = BeautifulSoup(test,"lxml")
    output=soup.get_text()

    #and then save the get_text() results to a unique file.

    file=open("url.txt","w",encoding='utf-8')
    file.write(output)
    file.close()

Thank you for taking a look. Best, George


Solution

  • Create different filename for each item in the list like below:

    import urllib
    from bs4 import BeautifulSoup
    
    
    x = ["https://www.sec.gov/Archives/edgar/data/1000298/0001047469-13-002555.txt",
    "https://www.sec.gov/Archives/edgar/data/1001082/0001104659-13-011967.txt"]
    
    for index , url in enumerate(x):
    
        #I want to open the URL listed in my list
    
        fp = urllib.request.urlopen(url)
        test = fp.read()
        soup = BeautifulSoup(test,"lxml")
        output=soup.get_text()
    
        #and then save the get_text() results to a unique file.
    
        file=open("url%s.txt" % index,"w",encoding='utf-8')
        file.write(output)
        file.close()