Search code examples
pythonbeautifulsoupio

in python, what should I add to fetch URLs form my (text file) or my (xml file) which include list of URLs?


I have this code which is all work fine with (one link ) Result of the code store values (availableOffers,otherpricess,currentprice,page_url) in (prices.csv) file

my problems are : First : I do not know what to write to fetch URLs form my (text file) or my (xml file) instead of one URL in this code

from bs4 import BeautifulSoup as soup  
from urllib.request import urlopen as uReq  

page_url = "XXXXXXXXX"


uClient = uReq(page_url)
page_soup = soup(uClient.read(), "html.parser")
uClient.close()


availableOffers = page_soup.find("input", {"id": "availableOffers"})["value"]
otherpricess = page_soup.find("span", {"class": "price"}).text.replace("$", "")
currentprice = page_soup.find("div", {"class": "is"}).text.strip().replace("$", "")


out_filename = "prices.csv"
headers = "availableOffers,otherpricess,currentprice,page_url \n"

f = open(out_filename, "w")
f.write(headers)


f.write(availableOffers + ", " + otherpricess + ", " + currentprice + ", " + page_url + "\n")

f.close()  

Second problem : when URL do not have value for (otherpricess ) I get this error

line 13, in <module> 
otherpricess = page_soup.find("span", {"class": "price"}).text.replace("$", "")
AttributeError: 'NoneType' object has no attribute 'text'

how I bypass this error and tell the code to work even there are a value missing

thanks


Solution

  • To fetch urls from text file, you can open a file (exactly as you did for write) in "r" mode, and iterate over it's line.

    For example, lets say you have the following urls file, named urls.txt:

    http://www.google.com
    http://www.yahoo.com
    

    In order to fetch the urls and iterate over them, do the following:

    out_filename = "prices.csv"
    headers = "availableOffers,otherpricess,currentprice,page_url \n"
    
    with open(out_filename, "w") as fw:
        fw.write(headers)
        with open("urls.txt", "r") as fr:
            for url in map(lambda x: x.strip(), fr.readlines()):  # the strip is to remove the trailing '\n'
                print(url)
                uClient = uReq(url)
                page_soup = soup(uClient.read(), "html.parser")
                # write the rest logic here
                # ...
                # write to the output file
                fw.write(availableOffers + ", " + otherpricess + ", " + currentprice + ", " + page_url + "\n")
    

    Regarding your second question, you can check that page_soup.find("span", {"class": "price"}) is not None and if so, extract the text. For example:

    otherpricess = page_soup.find("span", {"class": "price"}).text.replace("$", "") if page_soup.find("span", {"class": "price"}) else "" 
    # in case there is no value, otherpricess will be empty string but you can change it to any other value.