I'm trying to write a script that scrapes a website for product information.
Currently, the program uses a for-loop to scrape for a product price and a unique ID.
The for-loop contains two if-statements to stop it from scraping NoneTypes.
import requests
from bs4 import BeautifulSoup
def average(price_list):
return sum(price_list) / len(price_list)
# Requests search data from Website
page_link = 'URL'
page_response = requests.get(page_link, timeout=5) # gets the webpage (search) from Website
page_content = BeautifulSoup(page_response.content, 'html.parser') # turns the webpage it just retrieved into a BeautifulSoup-object
# Selects the product listings from page content so we can work with these
product_listings = page_content.find_all("div", {"class": "unit flex align-items-stretch result-item"})
prices = [] # Creates a list to add the prices to
uids = [] # Creates a list to store the unique ids
for product in product_listings:
## UIDS
if product.find('a')['id'] is not None:
uid = product.find('a')['id']
uids.append(uid)
# PRICES
if product.find('p', class_ = 'result-price man milk word-break') is not None:# assures that the loop only finds the prices
price = int(product.p.text[:-2].replace(u'\xa0', '')) # makes a temporary variable where the last two chars of the string (,-) and whitespace are removed, turns into int
prices.append(price) # adds the price to the list
On if product.find('a')['id'] is not None:
, I get a Exception has occurred: TypeError
'NoneType' object is not subscriptable
.
Whoever, if I run print(product.find('a')['id'])
, I get the value I'm looking for, which make me really confused. Don't that mean that the error is not a NoneType?
Also, if product.find('p', class_ = 'result-price man milk word-break') is not None:
works flawlessly.
I've tried assigning if product.find('p', class_ = 'result-price man milk word-break')
to an variable and then running it in the for-loop, but that did not work.
I've also made my fair share of googling, but to no prevail. The problem there might be that I'm relatively new to programming and don't know exactly what to search for, but I've still found a lot of answers that seem to be to related problems, but that won't work in my code.
Any help would be greatly appreciated!
Just make an intermediate step:
res = product.find('a')
if res is not None and res['id'] is not None:
uids.append(res['id'])
That way, if find returns None
because the item was not found, you will not end up trying to subscript NoneType.