Search code examples
pythonhtmlbeautifulsouphtml-parsing

Getting class data from BeautifulSoup


I am trying to get class data from an HTML page using BeautifulSoup. Here is how the data looks like:

    <div class="quoteText">
      &ldquo;I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.&rdquo;
  <br>  &#8213;
  <span class="authorOrTitle">
    Marilyn Monroe
  </span>
</div>

I just want the data under the class "quoteText" without the data in the class "authorOrTitle"

The following script returns the name of the author as well.

for div in soup.find('div', {'class': 'quoteText'}):
    print(div)

How can I get the "quoteText" class data without the "authorOrTitle" class data?

Thanks!


Solution

  • try this,

    from bs4 import BeautifulSoup
    
    sample = """<div class="quoteText">
          &ldquo;I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.&rdquo;
      <br>  &#8213;
      <span class="authorOrTitle">
        Marilyn Monroe
      </span>
    </div>
    """
    
    soup = BeautifulSoup(sample, "html.parser")
    
    print(soup.find('div', {'class': 'quoteText'}).contents[0].strip())