Search code examples
pythonweb-scrapingbeautifulsoup

Price comparison - python


Hi guys i am trying to create a program in python that compares prices from websites but i cant get the prices. I have managed to ge the title of the product and the quantity using the code bellow.

page = requests.get(urls[7],headers=Headers)
soup = BeautifulSoup(page.text, 'html.parser')

title = soup.find("h1",{"class" : "Titlestyles__TitleStyles-sc-6rxg4t-0 fDKOTS"}).get_text().strip()
quantity = soup.find("li", class_="quantity").get_text().strip()
total_price = soup.find('div', class_='Pricestyles__ProductPriceStyles-sc-118x8ec-0 fzwZWj price')
print(title)
print(quantity)
print(total_price)

Iam trying to get the price from this website (Iam creating a program do look for diper prices lol) https://www.drogasil.com.br/fralda-huggies-tripla-protecao-tamanho-m.html .

the price is not coming even if i get the text it always says that its nonetype.


Solution

  • Some of the information is built up via javascript from data stored in <script> sections in the HTML. You can access this directly by searching for it and using Python's JSON library to decode it into a Python structure. For example:

    from bs4 import BeautifulSoup
    import requests
    import json
    
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'}
    url = 'https://www.drogasil.com.br/fralda-huggies-tripla-protecao-tamanho-m.html'
    req = requests.get(url, headers=headers)
    soup = BeautifulSoup(req.content, 'html.parser')
    
    script = soup.find('script', type='application/ld+json')
    data = json.loads(script.text)
    
    title = data['name']
    total_price = data['offers']['price']
    
    quantity = soup.find("li", class_="quantity").get_text().strip()
    
    print(title)
    print(quantity)
    print(total_price)
    

    Giving you:

    HUGGIES FRALDAS DESCARTAVEL INFANTIL TRIPLA PROTECAO TAMANHO M COM 42 UNIDADES
    42 Tiras
    38.79
    

    I recommend you add print(data) to see what other information is available.