Search code examples
pythonweb-scrapingquoteschatbot

How do I scrape ONLY <div class ='quotetext'> from a website using python?


I am trying to import the Einstein quotes from this website:

https://www.goodreads.com/author/quotes/9810.Albert_Einstein

I want only the quote text. Not even his name, or anything else. Just the text, to help build a markhov chain chat bot.

This is the code I have:

from lxml import html
import requests

page = requests.get('https://www.goodreads.com/author/quotes/9810.Albert_Einstein')
tree = html.fromstring(page.content)

quotes = tree.xpath('//div[@class="quoteText"]/text()')


print quotes

And this is the output:

[u"\n \u201cTwo things are infinite: the universe and human stupidity; and I'm not sure about the universe.\u201d\n ", u' \u2015\n ', '\n', u'\n \u201cThere are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cI am enough of an artist to draw freely upon my imagination. Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.\u201d\n ', u' \u2015\n ', '\n', u"\n \u201cIf you can't explain it to a six year old, you don't understand it yourself.\u201d\n ", u' \u2015\n ', '\n', u'\n \u201cIf you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cLogic will get you from A to Z; imagination will get you everywhere.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cLife is like riding a bicycle. To keep your balance, you must keep moving.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cAnyone who has never made a mistake has never tried anything new.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cI speak to everyone in the same way, whether he is the garbage man or the president of the university.\u201d\n ', u' \u2015\n ', '\n', u"\n \u201cWhen you are courting a nice girl an hour seems like a second. When you sit on a red-hot cinder a second seems like an hour. That's relativity.\u201d\n ", u' \u2015\n ', '\n', u'\n \u201cNever memorize something that you can look up.\u201d\n ', u' \u2015\n
', '\n', u'\n \u201cA clever person solves a problem. A wise person avoids it.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cScience without religion is lame, religion without science is blind.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cReality is merely an illusion, albeit a very persistent one.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cIf we knew what it was we were doing, it would not be called research, would it?\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cI have no special talents. I am only passionately curious.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cIf a cluttered desk is a sign of a cluttered mind, of what, then, is an empty desk a sign?\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cThe important thing is to not stop questioning. Curiosity has its own reason for existence. One cannot help but be in awe when he contemplates the mysteries of eternity, of life, of the marvelous structure of reality. It is enough if one tries merely to comprehend a little of this mystery each day.', u'\xe2\x80\x94"Old Man\'s Advice to Youth: \'Never Lose a Holy Curiosity.\'" ', u' (2 May 1955) p. 64\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cTry not to become a man of success. Rather become a man of value.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cAny fool can know. The point is to understand.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cA human being is a part of the whole called by us universe, a part limited in time and space. He experiences himself, his thoughts and feeling as something separated from the rest, a kind of optical delusion of his consciousness. This delusion is a kind of prison for us, restricting us to our personal desires and to affection for a few persons nearest to us. Our task must be to free ourselves from this prison by widening our circle of compassion to embrace all living creatures and the whole of nature in its beauty.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cOnce you can accept the universe as matter expanding into nothing that is something, wearing stripes with plaid comes easy.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cIf I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of music.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cI know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cYou never fail until you stop trying.\u201d\n ', u' \u2015\n
', '\n', u'\n \u201cGreat spirits have always encountered violent opposition from mediocre minds.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cThe most beautiful experience we can have is the mysterious. It is the fundamental emotion that stands at the cradle of true art and true science.\u201d\n ', u' \u2015\n ', ',\n ', '\n \n\n \n', '\n\n\n', '\n\n', u'\n \u201cGravitation is not responsible for people falling in love.\u201d\n ', u' \u2015\n ', '\n', u"\n \u201cIt is not that I'm so smart. But I stay with the questions much longer.\u201d\n ", u' \u2015\n ', '\n']

I feel like there must be a better way to do this altogether, since this is printing in list form and has all this extra text, but I am hitting walls everywhere. Any help would be much appreciated!

Thanks


Solution

  • A python 2x script using module beautifulsoup

    from __future__ import print_function
    from re import sub
    from BeautifulSoup import BeautifulSoup
    from urllib2 import urlopen
    urlpage=urlopen("https://www.goodreads.com/author/quotes/9810.Albert_Einstein").read()
    bswebpage=BeautifulSoup(urlpage)
    results=bswebpage.findAll("div",{'class':"quoteText"})
    for result in results:
        print("\nQuotes\n")
        print(sub("&ldquo;|.&rdquo;","","".join(result.contents[0:1]).strip()))
    

    results on my side

    Quotes
    
    Two things are infinite: the universe and human stupidity; and I'm not sure about the universe
    
    Quotes
    
    There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle
    
    Quotes
    
    I am enough of an artist to draw freely upon my imagination. Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world
    ..............................................
    ..............................................