Beautifulsoup is handy for html parsing in python, but I meet problem to have clean code to get the value directly using string
or text
from bs4 import BeautifulSoup
tr ="""
<table>
<tr><td>text1</td></tr>
<tr><td>text2<div>abc</div></td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
td = row.findAll("td")
print td[0].text
print td[0].string
result:
text1
text1
text2abc
None
How can I get the result for
text1
text2
I want to skip the extra inner tag
beautifulsoup4-4.5.0
is used with python 2.7
You could try this:
for row in table.findAll("tr"):
td = row.findAll("td")
t = td[0]
print t.contents[0]
But that will only work if you are always looking for the text before the div tag