python-3.xregexselenium-webdriverbeautifulsoup

How to find main price and discounted price in a webpage using selenium and python?


I am trying to find a way to find main price and also discounted price in a webpage but I can get just one of them and I need a good pattern or method to extract all price and discounted prices from all kind of pages (A general regular expression or something). Example: https://www.needmode.com/product/%d9%85%db%8c%d8%b2-%d9%be%db%8c%d9%86%da%af-%d9%be%d9%86%da%af-%d8%a8%d8%a7%d8%aa%d8%b1%d9%81%d9%84%d8%a7%db%8c-25-%d9%85%db%8c%d9%84%db%8c%d9%85%d8%aa%d8%b1%db%8c/

My code:

    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")

                    for price in soup.findAll('span', {'class': [re.compile("price"), re.compile("cost")]}):
                        #print(price.text)
                        price = convert_numbers.persian_to_english(price.text)  # 512396044
                        #print(re.sub(r'\D', '', price))
                        print(re.sub(r'\D', '', price))
                        for result in price:
                            #get the main price ?
                            price = result.find('span')[0]
                            #get the discounted price ?
                            discounted_price = result.find('span')[1]

The result is a price in similar product (from product suggestion segment), and it's not correct price from current product and also is without discounted price.


Solution

  • Try:

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.needmode.com/product/%d9%85%db%8c%d8%b2-%d9%be%db%8c%d9%86%da%af-%d9%be%d9%86%da%af-%d8%a8%d8%a7%d8%aa%d8%b1%d9%81%d9%84%d8%a7%db%8c-25-%d9%85%db%8c%d9%84%db%8c%d9%85%d8%aa%d8%b1%db%8c/"
    
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    
    current_price = soup.select_one(".price ins").text.split()[0]
    before_price = soup.select_one(".price del").text.split()[0]
    
    print(f"{current_price=} {before_price=}")
    

    Prints:

    current_price='21,700,000' before_price='22,700,000'