Search code examples
pythonhtmlscreen-scraping

Scraping text from HTML5 website using Python


I need to way to scrape just the text from a website using python. I have installed BeautifulSoup 4, HTML Requests, and NLTK but I just can't seem to find out how to scrape.

I really need a simple snippet of code that I can plug any URL into and get the plain text. I'm trying to get it from this website


Solution

  • BeautifulSoup can extract all the texts from a page easily. The following is an example to extract texts inside the <body>...</body> section.

    import urllib
    from bs4 import BeautifulSoup
    from contextlib import closing
    
    url = 'https://developer.valvesoftware.com/wiki/Hammer_Selection_Tool'
    with closing(urllib.urlopen(url)) as h:
        soup = BeautifulSoup(h.read())
    
    print soup.body.get_text()