I have an html document like this: https://dropmefiles.com/wezmb So I need to extract text inside tags <span id="1" and </span , but I don't know how. I'm trying and write this code:
from bs4 import BeautifulSoup
with open("10_01.htm") as fp:
soup = BeautifulSoup(fp,features="html.parser")
for a in soup.find_all('span'):
print (a.string)
But it extract all information from all 'span' tags. So, how can i extract text inside tags <span id="1" and </span in Python?
What you need is the .contents
function. documentation
Find the span <span id = "1"> ... </span>
using
for x in soup.find(id = 1).contents:
print(x)
OR
x = soup.find(id = 1).contents[0] # since there will only be one element with the id 1.
print(x)
This will give you :
10
that is, an empty line followed by 10 followed by another empty line. This is because the string in the HTML is actually like that and prints 10 in a new line, as you can also see in the HTML that 10 has its separate line.
The string will correctly be '\n10\n'
.
If you want just x = '10'
from x = '\n10\n'
you can do : x = x[1:-1]
since '\n'
is a single character. Hope this helped.