I need to way to scrape just the text from a website using python. I have installed BeautifulSoup 4, HTML Requests, and NLTK but I just can't seem to find out how to scrape.
I really need a simple snippet of code that I can plug any URL into and get the plain text. I'm trying to get it from this website
BeautifulSoup can extract all the texts from a page easily. The following is an example to extract texts inside the <body>
...</body>
section.
import urllib
from bs4 import BeautifulSoup
from contextlib import closing
url = 'https://developer.valvesoftware.com/wiki/Hammer_Selection_Tool'
with closing(urllib.urlopen(url)) as h:
soup = BeautifulSoup(h.read())
print soup.body.get_text()