Search code examples
pythonpython-3.xbeautifulsouphtml-parser

How to ignore tags on beautifulsoup4 python


I'm working on a new project and I have some issues.

My problem as like that.

<div class="news">
      <p class="breaking">  </p>
      ...
<p> i need to pull here. </p>

but class = "breaking" is not let me to do it. I want to ignore the class "breaking" and pull the <p>.


Solution

  • Maybe, class='' would do with find_all or findAll:

    from bs4 import BeautifulSoup
    
    html = """
    <div class="news">
          <p class="breaking">  </p>
          ...
    <p> i need to pull here. </p>
    
    """
    
    soup = BeautifulSoup(html, 'html.parser')
    
    print(soup.find_all('p', class_=''))
    print(soup.findAll(True, {'class': ''}))
    

    Output

    [<p> i need to pull here. </p>]
    [<p> i need to pull here. </p>]