I have a div that looks somewhat like
<div>
" Base Text "
<span>
" Inner Text "
</span>
" Outer Base Text "
</div>
And I want to extract only the text not in the div's children (the immediate text), in this example, the immediate text is " Base Text " and " Outer Base Text ".
Is there any direct way (like a beautifulsoup function) to get the outer text in the div only, and ignore its inner contents?
Correction - there is a direct way - see comment from Barry above. Indirectly, you can do is get the whole tag, then list comprehension to keep only the main/parent tag/node:
html_content = '''
<div>
Base Text
<span>
Inner Text
</span>
Outer Base Text
</div>
'''
soup = BeautifulSoup(html_content, 'html.parser')
div = soup.find('div')
# Extract the text directly within the div, excluding children
text = ''.join([str(text) for text in div.strings if text.parent == div])