I am trying to scrape the table on http://apps2.eere.energy.gov/wind/windexchange/economics_tools.asp
the table by default shows 5 entries. I use dryscrape and BeautifulSoup as follows:
import dryscrape
from bs4 import BeautifulSoup
myurl = 'http://apps2.eere.energy.gov/wind/windexchange/economics_tools.asp'
session = dryscrape.Session()
session.visit(myurl)
response = session.body()
soup = BeautifulSoup(response,'lxml')
table = soup.find_all("td")
But this only returns the default 5 entries of that table. How can I get all rows in this table?
Thank you very much!
You don't need dryscrape for this particular page. Because the entirety of the table you are trying to get is in the source code html you can just do:
from bs4 import BeautifulSoup
import requests
myurl = 'http://apps2.eere.energy.gov/wind/windexchange/economics_tools.asp'
soup = BeautifulSoup(requests.get(myurl).text,'lxml')
table = soup.find_all("td")
Alternatively, with your current setup:
table = session.xpath('//td')
will give you nodes of the td tags in the dryscrape session. In which case you don't need beautiful soup.
session.body() gives you the html that is currently loaded into the dom. Since the java-script is acting on that and changing what is in the dom. Because of this you can do a for loop where you click on each next button in the session and after each iteration feed the body into beautiful soup, but that seems unnecessary to me.