Search code examples
beautifulsoupnonetype

beautifulsoup tag element contents() strip() method throw TypeError: Nonetype is not callable


case 1st:

<li class="chapters">
<i>In</i> 
<i>vitro</i> 
blahblah1 
<i>in</i> 
<i>vitro</i> 
blahblah2 
<a href="***">View details</a>
</li>

case 2nd:

<li class="chapters">   
blahblah2 
<a href="***">View details</a>
</li>

I got 2 problems: Problem 1st: when I use .contents[0].strip(), to get the blahblahs, case 2nd works. but will throw TypeError: Nonetype is not callable. At case 1st. .contents[0] of case 1st is a tag: In. is this a Nonetype? it's a tag but no Nonetype. Problem 2nd: how can I use one or two lines to deal with these 2 cases. the reason why case 1st exists I guess is the input error of the website.

by the way, I use the Beautifulsoup & lxml to parse the html.


Solution

  • select a tag then get previous content using .previous_sibling

    texts = soup.select('.chapters a')
    
    for t in texts:
        print(t.previous_sibling.strip())