Search code examples
pythonhtmlhtml-parsingbeautifulsoup

How to insert a blank space(&nbsp) into a Beautifulsoup tag?


I'm trying to add a '&nbsp' into a Beautifulsoup tag. BS converts the tag.string to \&ampamp;nbsp; instead of &nbsp. It has to be some encoding issue but I can't figure it out.

PLEASE NOTE: ignore the back '\' character. I had to add it so stackoverflow would format my question correctly.

import bs4 as Beautifulsoup

html = "<td><span></span></td>"
soup = Beautifulsoup(html)
tag = soup.find("td")
tag.string = "&nbsp;"

Current output is html = "\&amp;nbsp;"

Any ideas?


Solution

  • By default BeautifulSoup uses minimal output formatter and converts HTML entities.

    The solution is to set output formatter to None, quote from BS source (PageElement docstring):

    # There are five possible values for the "formatter" argument passed in
    # to methods like encode() and prettify():
    #
    # "html" - All Unicode characters with corresponding HTML entities
    #   are converted to those entities on output.
    # "minimal" - Bare ampersands and angle brackets are converted to
    #   XML entities: &amp; &lt; &gt;
    # None - The null formatter. Unicode characters are never
    #   converted to entities.  This is not recommended, but it's
    #   faster than "minimal".
    

    Example:

    from bs4 import BeautifulSoup
    
    
    html = "<td><span></span></td>"
    soup = BeautifulSoup(html, 'html.parser')
    tag = soup.find("span")
    tag.string = '&nbsp;'
    
    print soup.prettify(formatter=None)
    

    prints:

    <td>
     <span>
      &nbsp;
     </span>
    </td>
    

    Hope that helps.