Search code examples
pythonweb-scrapingbeautifulsoupattributeerrornonetype

BeautifulSoup 'find()' returns NoneType Value


I've just started to try and code a price tracker with Python, and have already ran into an error I don't understand. This is the code:

from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/Corsair-Platinum-Mechanical-Keyboard-Backlit/dp/B082GR814B/'
HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0."
                         "4103.116 Safari/537.36"}
targetPrice = 150


def getPrice():
    page = requests.get(URL, headers=HEADERS)
    soup = BeautifulSoup(page.content, 'html.parser')
    price = soup.find(id="priceblock_ourprice").get_text()    # Error happens here
    print(price)


if True:
    getPrice()

I see this part soup.find(id="priceblock_ourprice") returns a value of 'None' thus the AttributeError. I don't understand why it returns a 'None' value. Only ONCE did the code actually work and printed the product price, and never again. I ran the script again after the single successful attempt without changing anything, and got the AttributeError consistantly again.

I've also tried the following:

Used html5lib and lxml instead of html.parser. Different id's, to see if I can access different parts of a site. Other User Agents. I also downloaded a similar program from github that uses the exact same code to see if it would run, but it didn't either.

What is happening here? Any help would be appreciated.


Solution

  • You're getting captcha page. Try to set more HTTP headers as in browser to get correct page. When I set Accept-Language http header I cannot reproduce the error anymore:

    import requests
    from bs4 import BeautifulSoup
    
    
    URL = 'https://www.amazon.com/Corsair-Platinum-Mechanical-Keyboard-Backlit/dp/B082GR814B/'
    HEADERS = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0",
        'Accept-Language': 'en-US,en;q=0.5',
    }
    
    def getPrice():
        page = requests.get(URL, headers=HEADERS)
        soup = BeautifulSoup(page.content, 'html.parser')
        price = soup.find(id="priceblock_ourprice").get_text()
        print(price)
    
    
    getPrice()
    

    Prints:

    $195.99