python python-3.x beautifulsoup html-parsing

Beautiful Soup : How to extract data from HTML Tags from inconsistent data

I wanted to extract the data from tags which is coming in two forms :

<td><div><font> Something else</font></div></td>

and

<td><div><font> Something <br/>else</font></div></td>

I am using .string() method where in the first case it gives me the required string (Something else) but in the second case, it gives me None.

Is there any better way or alternative way to do it?

Solution

Try using .text property instead of .string

from bs4 import BeautifulSoup

html1 = '<td><div><font> Something else</font></div></td>'
html2 = '<td><div><font> Something <br/>else</font></div></td>'

if __name__ == '__main__':
    soup1 = BeautifulSoup(html1, 'html.parser')
    div1 = soup1.select_one('div')
    print(div1.text.strip())

    soup2 = BeautifulSoup(html2, 'html.parser')
    div2 = soup2.select_one('div')
    print(div2.text.strip())

which outputs:

Something else
Something else