I am looking to setup a automated screen scraper that will run on Google app engine using python. I want it to scrape the site and put the specified results into a Entity in app engine. I am looking for some directions on what to use. I have seen beautifulsoup but wonder if people could recommend anything else that could run on Google App engine.
Beautifulsoup runs fine on App Engine (just make sure to use 3.0.8, not the iffy 3.1.0). The main alternative, I think, would be html5lib -- I haven't tries it on App Engine but I believe it does run there (quite slowly -- if that's a problem I think you need to stick with BeautifulSoup), e.g. this service runs on App Engine and is based on html5lib.