Search code examples
htmlbeautifulsoup

How can I get attributes with spaces in BeautifulSoup?


I have this html:

<html lang="en" class="no-js">
    <div>
        <p class="price ">
            3.75
        </p>
        <p>21</p>
    </div>
</html>

I want to get the class of this

The problem is what ever I do to try to get it, every time he comes without the space.

current_element.get('class')...

Even str(current_element) come like this:

'<p class="price">3.75</p>'

How can I get the text of the class in raw? Or something like that? Regex of all the html is not a option cuz I can have htmls with 11k of lines and more

Thanks!


Solution

  • If you use the keyword argument multi_valued_attributes=None in your beautifulsoup constructor you will get the class string with the space. (Source: https://beautiful-soup-4.readthedocs.io/en/latest/#multi-valued-attributes )

    You will however lose the functionality of accessing multi-value attributes (such as class) as lists

    from bs4 import BeautifulSoup
    html = """<html lang="en" class="no-js">
        <div>
            <p class="price ">
                3.75
            </p>
            <p>21</p>
        </div>
    </html>"""
    
    soup = BeautifulSoup(html, multi_valued_attributes=None)
    soup.html.div.p["class"]
    

    Result:

    'price '