Search code examples
pythonweb-scrapingbeautifulsoup

How to do scraping from a page with BeautifulSoup


The question asked is very simple, but for me, it doesn't work and I don't know!

I want to scrape the rating beer from this page https://www.brewersfriend.com/homebrew/recipe/view/16367/southern-tier-pumking-clone with BeautifulSoup, but it doesn't work.

This is my code:

import requests
import bs4
from bs4 import BeautifulSoup



url = 'https://www.brewersfriend.com/homebrew/recipe/view/16367/southern-tier-pumking-clone'

test_html = requests.get(url).text

soup = BeautifulSoup(test_html, "lxml")

rating = soup.findAll("span", class_="ratingValue")

rating

When I finish, it doesn't work, but if I do the same thing with another page is work... I don't know. Someone can help me? The result of rating is 4.58


Solution

  • If you print the test_html, you'll find you get a 403 forbidden response.

    You should add a header (at least a user-agent : ) ) to your GET request.

    import requests
    from bs4 import BeautifulSoup
    
    
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'
    }
    
    url = 'https://www.brewersfriend.com/homebrew/recipe/view/16367/southern-tier-pumking-clone'
    
    test_html = requests.get(url, headers=headers).text
    
    soup = BeautifulSoup(test_html, 'html5lib')
    
    rating = soup.find('span', {'itemprop': 'ratingValue'})
    
    print(rating.text)
    
    # 4.58