I am trying to get text between tag and also text between sets of tags, I have tried but I haven't got what I want. Can anyone help? I really appreciate it.
text = '''
<b>Doc Type: </b>AABB
<br />
<b>Doc No: </b>BBBBF
<br />
<b>System No: </b>aaa bbb
<br />
<b>VCode: </b>040000033
<br />
<b>G Code: </b>000045
<br />
'''
the expected output:
Doc Type: AABB
Doc No: BBBBF
System No: aaa bbb
VCode: 040000033
G Code: 000045
the code I have tried, this only gave me the text between tags, but not text outside tags:
soup = BeautifulSoup(html, "html.parser")
print(soup.find_all('b'))
I also tried following, but it gave me all text on the page, I only want tags and text outside of the tags, :
soup = BeautifulSoup(html, "html.parser")
lines = ''.join(soup.text)
print(lines)
the current output is:
Doc Type:
Doc No:
System No:
VCode:
G Code:
YOu could use the .next_sibling
from each of those elements.
Code:
html = '''
<b>Doc Type: </b>AABB
<br />
<b>Doc No: </b>BBBBF
<br />
<b>System No: </b>aaa bbb
<br />
<b>VCode: </b>040000033
<br />
<b>G Code: </b>000045
<br />'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
bs = soup.find_all('b')
for each in bs:
eachFollowingText = each.next_sibling.strip()
print(f'{each.text} {eachFollowingText}')
Output:
Doc Type: AABB
Doc No: BBBBF
System No: aaa bbb
VCode: 040000033
G Code: 000045