I have run into a problem with replacing content, the problem occurs when the html contains something like:
<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
What I would like to do is replace the p tag's contents while discarding any additional styling, or tags inside. In this example that would mean that the strong tags would be no longer a part of the new string.
However, I find it impossible to replace the contents of the p tag altogether. I have googled my problem/errors but have not been able to come up with a working example.
This is my code and the tests I have tried to run, some throw an error and others simply don't do anything. You can unquote any of these to test it for yourself, but the results are already appended in the comments.
from bs4 import BeautifulSoup
src = "<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>"
soup=BeautifulSoup(src, "lxml")
for element in soup.findAll():
if element.name == 'p':
print(element)
#= <p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
print(element.text)
#= Next, go to your /home/pi/ directory and check if you can see the picture
print(element.contents)
#= ['Next, go to your ', <strong>/home/pi</strong>, ' directory and check if you can see the picture']
# -- test 1:
# element.string.replace_with("First, go to your /home/pi directory")
# AttributeError: 'NoneType' object has no attribute 'replace_with'
# -- test 2:
# element.replace("First, go to your /home/pi directory")
# TypeError: 'NoneType' object is not callable
# -- test 3:
# new_tag = soup.new_tag('li')
# new_tag.string = "First, go to your /home/pi directory"
# element.replace_with(new_tag)
# print(element)
# not replaced
# -- test 4:
# element.text.replace(str(element), "First, go to your /home/pi directory")
# print(element)
# not replaced
# -- test 5:
# element.text.replace(element.text, "First go to your /home/pi/ directory")
# print(element)
# not replaced
# -- test 6:
new_tag = soup.new_tag('li')
new_tag.string = "First, go to your /home/pi directory"
element.replaceWith(new_tag)
print(element)
# not replaced
# -- test 7:
# element.replace_with("First, go to your /home/pi directory")
# print(element)
# not replaced
I suspect the problem occurs due to element.contents
containing multiple items. However, element.text
provides me with what I need to process the string and replace it and I don't care about any styling inside.
As a last resort I will entertain str.replace
'ing the element from the formatted html, but I'd much rather handle this in BeautifulSoup if possible.
Sources used:
https://www.tutorialfor.com/questions-59179.htm https://beautiful-soup-4.readthedocs.io/en/latest/#modifying-the-tree https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup https://www.crummy.com/software/BeautifulSoup/bs4/doc/#replace-with https://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names
AttributeError: 'NoneType' object has no attribute 'replace_with'
I think you can just simply declare the element.string
with =
. No need to use .replace()
from bs4 import BeautifulSoup
src = "<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>"
soup=BeautifulSoup(src, "html.parser")
print ('Original: %s' %soup)
for element in soup.findAll():
if element.name == 'p':
element.string = "First, go to your /home/pi directory"
print('Altered: %s' %soup)
Output:
Original: <p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
Altered: <p>First, go to your /home/pi directory</p>