Search code examples
pythonhtmltextbeautifulsoup

Getting a specific text from html using BeautifulSoup


I have this .html code:

<div id="content">
            <ul id="tree">
                <li xmlns="" class="level top failed open">
                    <span><em class="time">
                            <div class="time">1.89 s</div>
                        </em>I need to get this text</span>

I need to get only the text that is outside all of the other tags (text is: I need to get this text).

I was trying to use this piece of code:

path = document.find('li', class_='level top').find_all("em")[-1].next_sibling
if not path:
    path = document.find('li', class_='level top failed open').find_all("em")[-1].next_sibling
return path

But I get an error: AttributeError: 'NoneType' object has no attribute 'find_all'.

Does anybody know how to access this text?

Thank you!


Solution

  • You can apply .contents and it will generate a list of output and the desired one is [-1]

    html = '''
    <div id="content">
     <ul id="tree">
      <li class="level top failed open" xmlns="">
       <span>
        <em class="time">
         <div class="time">
          1.89 s
         </div>
        </em>
        I need to get this text
       </span>
      </li>
     </ul>
    </div>
    
    '''
    
    from bs4 import BeautifulSoup
    soup=BeautifulSoup(html,'html.parser')
    #print(soup.prettify())
    
    txt= soup.select_one('#tree > li > span').contents[-1]
    print(txt)
    

    Output:

      I need to get this text