Let's say we have an html
file like this:
test.html
<div>
<i>Some text here.</i>
Some text here also.<br>
2 + 4 = 6<br>
2 < 4 = True
</div>
If I will pass this html
into BeautifulSoup
it will escape the &
sign near the plus
entity and output html
will be something like this:
<div>
<i>Some text here.</i>
Some text here also.<br>
2 &plus 4 = 6<br>
2 < 4 = True
</div>
Example python3
code:
from bs4 import BeautifulSoup
with open('test.html', 'rb') as file:
soup = BeautifulSoup(file, 'html.parser')
print(soup)
How can I avoid this behavior?
Read the description of different parser libraries: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser:
This could solve your problem:
s = '''
<div>
<i>Some text here.</i>
Some text here also.<br>
2 + 4 = 6<br>
2 < 4 = True
</div>'''
soup = BeautifulSoup(s, 'html5lib')
And you get:
>>> soup
<html><head></head><body><div>
<i>Some text here.</i>
Some text here also.<br/>
2 + 4 = 6<br/>
2 < 4 = True
</div></body></html>