Search code examples
htmlpython-3.xbeautifulsoupextract

Python BeautifulSoup to extract elements from html file


I'm new to BeautifulSoup and would like to use it to extract just elements 98.2% and 94.2%. I would like to print:

apples: 98.2% bananas: 94.2%

How do I do it? Thanks in advance.

<div>
  <table class="stock">
    <tr>
      <th></th>
      <th scope="col">Equal</th>
      <th scope="col">Total</th>
      <th scope="col">Fruits</th>
    </tr>
    <tr>
      <th scope="row">apples:</th>
      <td>524</td>
      <td>525</td>
      <td class="high">98.2%</td>
    </tr>
    <tr>
      <th scope="row">pears:</th>
      <td>58</td>
      <td>58</td>
      <td class="high">100.0%</td>
    </tr>
    <tr>
      <th scope="row">bananas:</th>
      <td>165</td>
      <td>179</td>
      <td class="high">94.2%</td>
    </tr>
  </table>

Initially, I tried the following but it prints: [98.2%, 100.0%, 94.2%]

from bs4 import BeautifulSoup
HTMLFile = open("stock.html", "r")
index = HTMLFile.read()
soup = BeautifulSoup(index, 'html.parser')
element = soup.select(".stock .high")
print(element)

Solution

  • Try:

    from bs4 import BeautifulSoup
    
    html_text = """\
      <table class="stock">
        <tr>
          <th></th>
          <th scope="col">Equal</th>
          <th scope="col">Total</th>
          <th scope="col">Fruits</th>
        </tr>
        <tr>
          <th scope="row">apples:</th>
          <td>524</td>
          <td>525</td>
          <td class="high">98.2%</td>
        </tr>
        <tr>
          <th scope="row">pears:</th>
          <td>58</td>
          <td>58</td>
          <td class="high">100.0%</td>
        </tr>
        <tr>
          <th scope="row">bananas:</th>
          <td>165</td>
          <td>179</td>
          <td class="high">94.2%</td>
        </tr>
      </table>"""
    
    soup = BeautifulSoup(html_text, "html.parser")
    
    for tr in soup.select('tr:-soup-contains("apples", "bananas")'):
        print(tr.th.text, tr.find(class_="high").text)
    

    Prints:

    apples: 98.2%
    bananas: 94.2%