Search code examples
pythonhtmltextbeautifulsoupfindall

Get Beautfiulsoup div class content


i'm working on beautifulsoup. I want to access the text in div. My code is below.

attack = atackersoup.findAll("div", {"class":"col-12 description"})

and my output is below

<div class="col-12 description">
                A denial of service vulnerability was identified that exists in Apache SpamAssassin before 3.4.2.
            </div>

I just want the text. Do not show div tags.


Solution

  • To get the text from the tag, use this:

    print(attack.text.strip())
    

    Output:

    A denial of service vulnerability was identified that exists in Apache SpamAssassin before 3.4.2.
    

    Here is the full code:

    html = """
    <div class="col-12 description">
                    A denial of service vulnerability was identified that exists in Apache SpamAssassin before 3.4.2.
                </div>
    """
    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(html,'html5lib')
    
    div = soup.find('div', class_ = "col-12 description")
    
    print(div.text.strip())
    

    Since you have a list of elements, you should loop thru the elements and print the text, like:

    for div in attack:
        print(div.text.strip())