Search code examples
pythonhtmlweb-scrapingbeautifulsouppython-requests-html

web-scraping in python using beautiful soup: AttributeError: 'NoneType' object has no attribute 'text'


I am working on a webscraper using html requests and beautiful soup (I am new to this). For 1 webpage (https://www.superdrug.com/Make-Up/Face/Primer/Face-Primer/Max-Factor-False-Lash-Effect-Max-Primer/p/788724) I am trying to scrape the price of the product. The HTML is:

<span class="pricing__now" itemprop="price">8.99</span>

I have tried using soup.find and soup.find_all:

r = session.get(link)
r.html.render(sleep=3, timeout=30)
soup = BeautifulSoup(r.content, 'lxml')
price = soup.find('span', itemprop="price").text
r = session.get(link)
r.html.render(sleep=3, timeout=30)
soup = BeautifulSoup(r.content, 'lxml')
price = soup.find_all('span', itemprop="price").text

and r.html.find:

r = session.get(link)
r.html.render(sleep=6, timeout=30)
price = r.html.find('body > div.pdp-container > div.content-wrapper.pdp > div > div > div.pdp__purchase-options > div.pricing > span:nth-child(2)', first=True).text

None and empty lists are returned, or an AttributeError: 'NoneType' object has no attribute 'text'. I am unsure of why I cannot get this information out. Any help would be appreciated.


Solution

  • You can get the price from Json data embedded within the page. For example:

    import json
    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.superdrug.com/Make-Up/Face/Primer/Face-Primer/Max-Factor-False-Lash-Effect-Max-Primer/p/788724"
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
    }
    
    soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
    
    data = json.loads(soup.select('[type="application/ld+json"]')[1].contents[0])
    
    # uncomment this to print all data:
    # print(json.dumps(data, indent=4))
    
    print(data["offers"]["price"])
    

    Prints:

    8.99