Search code examples
pythonweb-scrapingbeautifulsouphtml-parsing

How to get only value of parent div element and exclude remaining child div elements using Beautifulsoup


Decided to play around with web scraping. Got stuck with a tricky div block, and spent hours searching and trying to figure out how to solve this issue and return the expected output I would have expected by default. But can't seem to get my head around the approach to take.

I'm having problems with div under the class "listing__details-pricing". Div with class "listing__details-pricing" comes in three different forms. Form 3 returns my expected outcomes, the other forms return additional values that I didn't expect to be returned.

Form 1:

<div class="listing__details-pricing">
   €16,000 
   <div class="listing__details-private-seller">Private</div>
</div>

Form 2:

<div class="listing__details-pricing">
   €16,000
   <div class="listing__details-pricing-monthly">
      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
         <path d="M235.4 172.2c0-11.4 9.3-19.9 20.5-19.9 11.4 0 20.7 8.5 20.7 19.9s-9.3 20-20.7 20c-11.2 0-20.5-8.6-20.5-20zm1.4 35.7H275V352h-38.2V207.9z"></path>
         <path d="M256 76c48.1 0 93.3 18.7 127.3 52.7S436 207.9 436 256s-18.7 93.3-52.7 127.3S304.1 436 256 436c-48.1 0-93.3-18.7-127.3-52.7S76 304.1 76 256s18.7-93.3 52.7-127.3S207.9 76 256 76m0-28C141.1 48 48 141.1 48 256s93.1 208 208 208 208-93.1 208-208S370.9 48 256 48z"></path>
      </svg>
      €306
      <div class="listing__details-pricing-monthly-per-month">PER MONTH</div>
   </div>
</div>

Form 3:

<div class="listing__details-pricing">€16,250</div>

Code:

from bs4 import BeautifulSoup


html = """<html>
<body>
       <div class="vehicle-search-form__results">
                         <div class="listing__details listing__details--desktop">
                            <div class="listing__details-location">Meath</div>
                            <div class="listing__details-vehicle">
                               <h2>VOLKSWAGEN Golf</h2>
                               <p>1.6 TDI MATCH EDITION BLUEMOTION 110PS 5DR</p>
                            </div>
                            <div class="listing__details-data">
                               <div class="listing__details-data-year">
                                  <p>2016</p>
                               </div>
                               <div class="listing__details-data-reg">(161 REG)</div>
                               <div class="listing__details-data-mileage">140,012 km</div>
                            </div>
                            <div class="listing__details-pricing">
                               €16,000
                               <div class="listing__details-private-seller">Private</div>
                            </div>
                            <div class="listing__details-color">
                               <span class="" style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                      
         
                 
                         <div class="listing__details listing__details--desktop">
                            <div class="listing__details-location">Longford</div>
                            <div class="listing__details-vehicle">
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>2.0 TDI SE BUSINESS</p>
                            </div>
                            <div class="listing__details-data">
                               <div class="listing__details-data-year">
                                  <p>2015</p>
                               </div>
                               <div class="listing__details-data-reg">(152 REG)</div>
                               <div class="listing__details-data-mileage">164,778 km</div>
                            </div>
                            <div class="listing__details-pricing">€16,250</div>
                            <div class="listing__details-color">
                               <span class="" style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
                         
                         <div class="listing__details listing__details--desktop">
                            <div class="listing__details-location">Monaghan</div>
                            <div class="listing__details-vehicle">
                               <h2>VOLKSWAGEN Passat</h2>
                               <p>HIGHLINE BE 2.0 TDI MANUAL 6SPEED FWD 150HP 4DR</p>
                            </div>
                            <div class="listing__details-data">
                               <div class="listing__details-data-year">
                                  <p>2016</p>
                               </div>
                               <div class="listing__details-data-reg">(161 REG)</div>
                               <div class="listing__details-data-mileage">230,000 km</div>
                            </div>
                            <div class="listing__details-pricing">
                               €16,000
                               <div class="listing__details-pricing-monthly">
                                  <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
                                     <path d="M235.4 172.2c0-11.4 9.3-19.9 20.5-19.9 11.4 0 20.7 8.5 20.7 19.9s-9.3 20-20.7 20c-11.2 0-20.5-8.6-20.5-20zm1.4 35.7H275V352h-38.2V207.9z"></path>
                                     <path d="M256 76c48.1 0 93.3 18.7 127.3 52.7S436 207.9 436 256s-18.7 93.3-52.7 127.3S304.1 436 256 436c-48.1 0-93.3-18.7-127.3-52.7S76 304.1 76 256s18.7-93.3 52.7-127.3S207.9 76 256 76m0-28C141.1 48 48 141.1 48 256s93.1 208 208 208 208-93.1 208-208S370.9 48 256 48z"></path>
                                  </svg>
                                  €306
                                  <div class="listing__details-pricing-monthly-per-month">PER MONTH</div>
                               </div>
                            </div>
                            <div class="listing__details-color">
                               <span class="" style="background-color: black;"></span>
                               <p>Black</p>
                            </div>
                         </div>
             <div class="ais-InfiniteScroll-sentinel"></div>
          </div>

</body>
</html>
"""

soup = BeautifulSoup(html, "html.parser")
results = soup.find(class_="vehicle-search-form__results")

job_elements = results.find_all(class_="listing__details listing__details--desktop")
for job_element in job_elements:
    price = job_element.find(class_="listing__details-pricing")

    print(price.text.strip())

Current output:

€16,000
Private
€16,250
€16,000€306PER MONTH

Expected output:

€16,000
€16,250
€16,000

Solution

  • Change the last line to:

    print(price.contents[0].strip())
    

    This prints:

    €16,000
    €16,250
    €16,000
    

    Or:

    print(price.find(text=True).strip())