Search code examples
textbeautifulsouptags

Select tags with some specified text in beautiful soup


On some html-page I have a bunch of tags that look like:

<a class="country" href="www.google.com" title="Germany">09:18, 9 July 2021</a>

In BeautifulSoup I need to select only those tags for Germany where the year is 2019 (so, for example, the sample tag doesn't fit here, as it has 2021).'

What's the best way to do it? I'm only learning BS from scratch and so far I can only do this:

germany = germany_soup.find_all(attrs={"title": "Germany"})

and then check for every tag in germany whether its textattribute contains 2019.

My question: Is this the conventional way of doing things for that problem and is there a way to somehow specify '2019' in the find_all to avoid 'manual' check whether each tag.text has '2019' in the loop?


Solution

  • you can use re module to find for specific text in all tag to extract suitable output

    html="""<a class="country" href="www.google.com" title="Germany">09:18, 9 July 2021</a>
        <a class="country" href="www.google.com" title="Germany">09:18, 9 July 2019</a>
        <a class="country" href="www.google.com" title="Germany">07:11, 9 July 2019</a>
        <a class="country" href="www.google.com" title="Germany">09:18, 9 July 2010</a>
        """
    
    
    import re
    soup=BeautifulSoup(html,"html.parser")
    soup.find_all("a",attrs={"title": "Germany"},text=re.compile("2019"))
    

    Output:

    [<a class="country" href="www.google.com" title="Germany">09:18, 9 July 2019</a>,
     <a class="country" href="www.google.com" title="Germany">07:11, 9 July 2019</a>]