Search code examples
beautifulsoupjssoup

Does JSSoup support extracting text?


Does JSSoup support extracting text similar to Beautiful Soup soup.findAll(text=True)?

The documentation does not provide any information about this use case, but seems to me that there should be a way.

To clarify what I want is to grab all visible text from the page.


Solution

  • In beautiful soup you can extract text in different ways with find_all(text=True) but also with .get_text() or .text.

    JSSoup works similar to beautiful soup - To extract all visible text just call .get_text(), .text or string on your soup.

    Example (jssoup)

    var soup = new JSSoup('<html><head><body>text<p>ptext</p></body></head></html>');
    soup.get_text('|')
    // 'text|ptext'
    
    soup.get_text('|').split('|')
    // ['text','ptext']
    

    Example (beautiful soup)

    from bs4 import BeautifulSoup
    html = '''<html><head><body>text<p>ptext</p></body></head></html>'''
    
    soup = BeautifulSoup(html, "html.parser") 
    print(soup.get_text('|').split('|'))
    

    Output

    ['text','ptext']