I am trying to edit the inner HTML of some elements in Python using BeautifulSoup. Here is a simple example:
from bs4 import BeautifulSoup
import html
html_str = '<div><span><strong>Hello world</strong></span></div>'
soup = BeautifulSoup(html_str, 'html.parser')
span = soup.select_one('span')
span.replace_with('message: ' + html.unescape(span.decode_contents()) + ', end of message')
print(soup)
I was expecting to get a decoded string, like:
<div>message: <strong>Hello world</strong>, end of message</div>
But instead I got:
<div>message: <strong>Hello world</strong>, end of message</div>
Notice that this behaviour only happens when the target element contains a child, e.g. if you try to execute this code on the strong element (with soup.select_one('strong')
), it works as expected.
The easiest way is to use .replace_with
with new BeautifulSoup
object, e.g.:
from bs4 import BeautifulSoup
html_str = "<div><span><strong>Hello world</strong></span></div>"
soup = BeautifulSoup(html_str, "html.parser")
span = soup.select_one("span")
span.replace_with(BeautifulSoup(f"message: {str(span)}, end of message", "html.parser"))
print(soup)
Prints:
<div>message: <span><strong>Hello world</strong></span>, end of message</div>