Search code examples

Why am I receiving an attribute error for my BeautifulSoup code when the variable in question has a value?

I am using Python 3.9.1 with selenium and BeatifulSoup in order to create my first webscraper for Tesco's website (a mini project to teach myself). However, when I run the code, as shown below, I receive an attribute error:

Traceback (most recent call last):
  File "c:\Users\Ozzie\Dropbox\My PC (DESKTOP-HFVRPAV)\Desktop\Tesco\", line 37, in <module>
    clean_product_data = process_products(html)
  File "c:\Users\Ozzie\Dropbox\My PC (DESKTOP-HFVRPAV)\Desktop\Tesco\", line 23, in process_products
    weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
AttributeError: 'NoneType' object has no attribute 'find'

I am unsure what is going wrong - the title and URL sections work fine, but the weight and price sections return this value. When I have tried printing the product_price and product_price_weight variables, they have returned the values I expected them to (I won't post that here, it's just very long HTML).

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from import ChromeDriverManager
import time
from bs4 import BeautifulSoup

driver = webdriver.Chrome(ChromeDriverManager().install())

def process_products(html):
    clean_product_list = []
    soup = BeautifulSoup(html, 'html.parser')
    products = soup.find_all("div",{"class":"product-tile-wrapper"})

    for product in products:
        data_dict = {}
        product_details = product.find("div",{"class":"product-details--content"})
        product_price = product.find("div",{"class":"price-control-wrapper"})
        product_price_weight = product.find("div",{"class":"price-per-quantity-weight"})

        data_dict['title'] = product_details.find('a').text.strip()
        data_dict['product_url'] = ('') + (product_details.find('a')['href'])
        weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
        data_dict['price'] = product_price.find("span",{"class":"value"}).text.strip()
        data_dict['price'+weight] = product_price_weight.find("span",{"class":"value"}).text.strip()
    return clean_product_list 

master_list = []

for i in range (1,3):
    print (i)
    html = driver.page_source
    clean_product_data = process_products(html)

print (master_list)

Any help is much appreciated. Many thanks,


  • You can try this by updating your process_products function. Take note again THERE ARE CASES where some of your variable that you are trying to do a .find() returns a None which simply means that it HAS NOT find any element base on the parameters given on your .find() function.

    Example this one:

    Let's say this part of code has been executed

    product_details = product.find("div",{"class":"product-details--content"})

    Now if it finds an element based on those tags & class it will return a bs4 object but if not it will return None so let's say it returned None.

    So your product_details variable will be a None object so once it is None again here on your code you do this. Again where product_details is None

    data_dict['title'] = product_details.find('a').text.strip()
    #Another way of saying is 
    #data_dict['title'] = None.find('a').text.strip() ##Clearly an ERROR

    So what I did this here is put it in a try except to simply catch those errors and give you empty strings indicating that probably your variable you're trying to do a .find() returns a None or might be some errors (the point is there is no relevant data being returned), that's why I use try except but you could also just make an if else out of this, but I think doing it in a try except is better.

    def process_products(html):
        clean_product_list = []
        soup = BeautifulSoup(html, 'html.parser')
        products = soup.find_all("div",{"class":"product-tile-wrapper"})
        for product in products:
            data_dict = {}
            product_details = product.find("div",{"class":"product-details--content"})
            product_price = product.find("div",{"class":"price-control-wrapper"})
            product_price_weight = product.find("div",{"class":"price-per-quantity-weight"})
                data_dict['title'] = product_details.find('a').text.strip()
                data_dict['product_url'] = ('') + (product_details.find('a')['href'])
            except BaseException as no_prod_details:
                This would mean that your product_details variable might be equal to None, so catching the error & setting
                yoour data with empty strings, indicating it can't do a .find()
                data_dict['title'] = ''
                data_dict['product_url'] = ''
                data_dict['price'] = product_price.find("span",{"class":"value"}).text.strip()
            except BaseException as no_prod_price:
                #Same here
                data_dict['price'] =''
                weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
                data_dict['price'+weight] = product_price_weight.find("span",{"class":"value"}).text.strip()
            except BaseException as no_prod_price_weigth:
                #Same here again
                weight = ''
                data_dict['price'+weight] = ''
        return clean_product_list