Search code examples
pythonstringbeautifulsouphtml-parsingmarkup

Append markup string to a tag in BeautifulSoup


Is it possible to set markup as tag content (akin to setting innerHtml in JavaScript)?

For the sake of example, let's say I want to add 10 <a> elements to a <div>, but have them separated with a comma:

soup = BeautifulSoup(<<some document here>>)

a_tags = ["<a>1</a>", "<a>2</a>", ...] # list of strings
div = soup.new_tag("div")
a_str = ",".join(a_tags)

Using div.append(a_str) escapes < and > into &lt; and &gt;, so I end up with

<div> &lt;a1&gt; 1 &lt;/a&gt; ... </div>

BeautifulSoup(a_str) wraps this in <html>, and I see getting the tree out of it as an inelegant hack.

What to do?


Solution

  • You need to create a BeautifulSoup object out of your HTML string containing links:

    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup()
    div = soup.new_tag('div')
    
    a_tags = ["<a>1</a>", "<a>2</a>", "<a>3</a>", "<a>4</a>", "<a>5</a>"]
    a_str = ",".join(a_tags)
    
    div.append(BeautifulSoup(a_str, 'html.parser'))
    
    soup.append(div)
    print soup
    

    Prints:

    <div><a>1</a>,<a>2</a>,<a>3</a>,<a>4</a>,<a>5</a></div>
    

    Alternative solution:

    For each link create a Tag and append it to div. Also, append a comma after each link except last:

    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup()
    div = soup.new_tag('div')
    
    for x in xrange(1, 6):
        link = soup.new_tag('a')
        link.string = str(x)
        div.append(link)
    
        # do not append comma after the last element
        if x != 6:
            div.append(",")
    
    soup.append(div)
    
    print soup
    

    Prints:

    <div><a>1</a>,<a>2</a>,<a>3</a>,<a>4</a>,<a>5</a></div>