Search code examples
pythonpython-3.xbeautifulsouphtml-parsing

In BeautifulSoup, how do I search for an element that contains text but also has an ancestor with a certain class?


I'm using BeautifulSoup 4 with Python 3.7. I want to find an element that has the text " points" in its element, but also has an ancestor DIV whose class attribute contains "article". I have figured out how to search for elements with text ...

points_elt = soup.find_all(text=re.compile(' points'))[0]

but I can't figure out how to expand the above to include elements with that text that also include an ancestor with the class "article." This is an example of the element I would like to find ..

<div class="article class2">
    ... other elements ...
    <span class="outerSpan">
        <span class="innerSpan">2000 points</span>
    </span>
   ... other element closing tags ...
</div>

This is another example it should work on ...

<div class="article class7">
    <p>
        <div class="abc">
            <span class="outerSpan">
                <span>8000 points</span>
            </span>             
        </div>
    </p>
</div>

Solution

  • You can use css selector and check the string you are looking after.

    html='''<div class="article class2">
        <span class="outerSpan">
            <span class="innerSpan">2000 points</span>
        </span>
    </div>
    '''
    
    soup=BeautifulSoup(html,'html.parser')
    for item in soup.select('.article .innerSpan'):
       if 'points' in item.text:
           print(item.text)
    

    Or You can use this.

    soup=BeautifulSoup(html,'html.parser')
    for item in soup.select('.article:contains(points)'):
       print(item.text.strip())