Search code examples
pythongoogle-app-enginescreen-scraping

Setting up a python screen scraper that could work on Google App engine


I am looking to setup a automated screen scraper that will run on Google app engine using python. I want it to scrape the site and put the specified results into a Entity in app engine. I am looking for some directions on what to use. I have seen beautifulsoup but wonder if people could recommend anything else that could run on Google App engine.


Solution

  • Beautifulsoup runs fine on App Engine (just make sure to use 3.0.8, not the iffy 3.1.0). The main alternative, I think, would be html5lib -- I haven't tries it on App Engine but I believe it does run there (quite slowly -- if that's a problem I think you need to stick with BeautifulSoup), e.g. this service runs on App Engine and is based on html5lib.