If I find a certain tag using beautifulsoup
:
styling = paragraphs.find_all('w:rpr')
I look at the next tag. I only want to use that tag if it is a <w:t>
tag. How do I check what type of tag the next tag is?
I tried element.find_next_sibling().startswith('<w:t')
for the element but it says NoneType object is not callable
. I also tried element.find_next_sibling().find_all('<w:t'>)
but it doesn't return anything.
for element in styling:
next = element.find_next_sibling()
if(#next is a <w:t> tag):
...
i am using beautifulsoup
and would like to stick with it and not add eTree
or other parser if possible with bs4.
Using item.name
you can see tag's name.
Problem is that between tags there are elements NavigableString
which are also treated as sibling elements and they gives None
.
You would have to skip these elements or you could get all siblings and use for
loop to find first <w:t>
and exit loop with break
from bs4 import BeautifulSoup as BS
text = '''<div>
<w:rpr></w:rpr>
<w:t>A</w:t>
</div>'''
soup = BS(text, 'html.parser')
all_wrpr = soup.find_all('w:rpr')
for wrpr in all_wrpr:
next_tag = wrpr.next_sibling
print('name:', next_tag.name) # None
next_tag = wrpr.next_sibling.next_sibling
#next_tag = next_tag.next_sibling
print('name:', next_tag.name) # w:t
print('text:', next_tag.text) # A
#name: None
#name: w:t
#text: A
print('---')
all_siblings = wrpr.next_siblings
for item in all_siblings:
if item.name == 'w:t':
print('name:', item.name) # w:t
print('text:', item.text) # A
break # exit after first <w:t>
#name: w:t
#text: A
EDIT: If you test code with HTML formated little different
text = '''<div>
<w:rpr></w:rpr><w:t>A</w:t>
</div>'''
then there will be no NavigableString
between tags and first method will fail but second method will still work.