I'd like to extract text only from the parent tag using Requests-HTML. If we have html like this
<td>
<a href="">There</a> <a href="">are</a> <a href="">some</a> <a href="">links.</a> The text that we are looking for.
<td>
then
html.find('td', first=True).text
results in
>>> There are some links. The text that we are looking for.
You can use an xpath
expression, which is directly supported by the library
from requests_html import HTML
doc = """<td>
<a href="">There</a> <a href="">are</a> <a href="">some</a> <a href="">links/</a> The text that we are looking for.
<td>"""
html = HTML(html=doc)
# the list will contain all the whitespaces "between" <a> tags
text_list = html.xpath('//td/text()')
# join the list and strip the whitespaces
print(''.join(text_list).strip()) # The text that we are looking for.
The expression //td/text()
will select all td
nodes and their text root text content (//td//text()
would select all text content).