Search code examples
pythonbeautifulsouphtml-parsing

BeautifulSoup to get first value using string/text


Beautifulsoup is handy for html parsing in python, but I meet problem to have clean code to get the value directly using string or text

from bs4 import BeautifulSoup
tr ="""    
<table>
    <tr><td>text1</td></tr>
    <tr><td>text2<div>abc</div></td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
    td = row.findAll("td")
    print td[0].text
    print td[0].string

result:

text1
text1
text2abc
None

How can I get the result for

text1
text2

I want to skip the extra inner tag

beautifulsoup4-4.5.0 is used with python 2.7


Solution

  • You could try this:

    for row in table.findAll("tr"):
        td = row.findAll("td")
        t = td[0]
        print t.contents[0]
    

    But that will only work if you are always looking for the text before the div tag