Search code examples
pythonlistbeautifulsoupurllib

how to extract text from a list of url and save them separately


I have a list of urls. There are 100 urls in that list and all those urls contains text. I want to extract text from those urls and save those text in text1, text2, text3 and so on. I am only able to do this.

list_of_urls = ['abc.com', 'def.com', 'sssj.com', ... and so on]
import urllib

text = []
data = urllib.request.urlopen('abc.com')
for line in data:
    line = line.decode('utf-8')
    text.append(line)

this above code only work for one url. But I want to loop over all urls in my list and store there output in text1, text2, text3 and so on.


Solution

  • I'm not sure how exactly you want to store the separate texts, but this code will create a dict where the keys are the text1, text2, ... and the values are lists with the sentences from that text.

    import urllib
    list_of_urls = ['abc.com', 'def.com', 'sssj.com', ... and so on]
    
    result = {}
    for idx, url in enumerate(list_of_urls):
        data = urllib.request.urlopen(url)
        text = []
        for line in data:
            line = line.decode('utf-8')
            text.append(line)
            
        result[f"text{idx}"] = text