Search code examples
pythonbeautifulsoupcss-selectorshtml-parsing

How To Extract Nested HTML Using BeautifulSoup


I need to extract the price using BeautifulSoup for the HTML code below.

<div class="price-original">
  <span class="product-price-amount">
    <span class="notranslate"> £899.89</span>
  </span>
<div>

I'm unable to use the code below as there are several price instances on the web page using the same html syntax.

price1 = soup.find('div', class_='price-original').find('span', class_="notranslate").text.strip().replace("£","").replace(",","")
print('Price:', price1)

For this reason I need a way to extract based on all 3 html elements as this results in a unique HTML instance.


Solution

  • From a cursory review of the site, it looks like the price you are interested in (i.e., the final price of the main product on the page) is inside a different html based on whether the product is discounted or not.

    Assuming the your soup actually contains this information, for discounted products, try using

    soup.select_one('div.product-price-container span.you-pay-value')
    

    For non-discounted products, try:

    soup.select_one('div.product-price-container span.product-price-amount > span.notranslate')