Search code examples
pythonhtmlbeautifulsouptext-extraction

Python BeautifulSoup issue in extracting direct text in a given html tag


I am trying to extract direct text in a given HTML tag. Simply, for <p> Hello! </p>, the direct text is Hello!. The code works well except with the case below.

from bs4 import BeautifulSoup
soup = BeautifulSoup('<div> <i> </i> FF Services </div>', "html.parser")
for tag in soup.find_all():
    direct_text = tag.find(string=True, recursive=False)
    print(tag, ':', direct_text)

Output:

`<div> <i> </i> FF Services </div> :  `
`<i> </i> :  `

The first printed output should be <div> <i> </i> FF Services </div> : FF Services , but it skips FF Services. I found that when I delete <i> </i> the code works fine.

What's the problem here?


Solution

  • Using .find_all instead of .find will give the desired output. Try this code.

    for tag in soup.find_all():
        direct_text = tag.find_all(string=True, recursive=False)
        print(tag, ':', direct_text)