case 1st:
<li class="chapters">
<i>In</i>
<i>vitro</i>
blahblah1
<i>in</i>
<i>vitro</i>
blahblah2
<a href="***">View details</a>
</li>
case 2nd:
<li class="chapters">
blahblah2
<a href="***">View details</a>
</li>
I got 2 problems: Problem 1st: when I use .contents[0].strip(), to get the blahblahs, case 2nd works. but will throw TypeError: Nonetype is not callable. At case 1st. .contents[0] of case 1st is a tag: In. is this a Nonetype? it's a tag but no Nonetype. Problem 2nd: how can I use one or two lines to deal with these 2 cases. the reason why case 1st exists I guess is the input error of the website.
by the way, I use the Beautifulsoup & lxml to parse the html.
select a
tag then get previous content using .previous_sibling
texts = soup.select('.chapters a')
for t in texts:
print(t.previous_sibling.strip())