Search code examples
javascriptpythoniframebeautifulsoupdryscrape

dryscrape and BeautifulSoup to get all rows in a js rendered iframe


I am trying to scrape the table on http://apps2.eere.energy.gov/wind/windexchange/economics_tools.asp

enter image description here

the table by default shows 5 entries. I use dryscrape and BeautifulSoup as follows:

import dryscrape
from bs4 import BeautifulSoup
myurl = 'http://apps2.eere.energy.gov/wind/windexchange/economics_tools.asp'
session = dryscrape.Session()
session.visit(myurl)
response = session.body()
soup = BeautifulSoup(response,'lxml')
table = soup.find_all("td")

But this only returns the default 5 entries of that table. How can I get all rows in this table?

Thank you very much!


Solution

  • You don't need dryscrape for this particular page. Because the entirety of the table you are trying to get is in the source code html you can just do:

    from bs4 import BeautifulSoup
    import requests
    
    myurl = 'http://apps2.eere.energy.gov/wind/windexchange/economics_tools.asp'
    soup = BeautifulSoup(requests.get(myurl).text,'lxml')
    table = soup.find_all("td")
    

    Alternatively, with your current setup:

    table = session.xpath('//td')
    

    will give you nodes of the td tags in the dryscrape session. In which case you don't need beautiful soup.

    session.body() gives you the html that is currently loaded into the dom. Since the java-script is acting on that and changing what is in the dom. Because of this you can do a for loop where you click on each next button in the session and after each iteration feed the body into beautiful soup, but that seems unnecessary to me.

    useful reference