Search code examples
javascriptpythonscreen-scrapingbeautifulsoup

Python BeautifulSoup on javascript tables with multiple pages


I used to have a python script that pulled data from the below table properly using Mechanize and BeautifulSoup. However, this site has recently changed the encoding of the table to javascript, and I'm having trouble working with it because there are multiple pages to the table.

http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2011&month=0&season1=&ind=0&team=25&players=0

For example, in the link above, how could I grab the data from both page 1 and page 2 of the table? FWIW, The URL doesn't change.


Solution

  • Your best bet is to run a headless browser e.g phantomjs which understands all the intricacies of JavaScript, DOM etc but you will have to write your code in Javascript, benefit is that you can do whatever you want, parsing html using BeautifulSoup is cool for a while but is headache in long term. So why scrape when you can access the DOM