Rapid Miner Not Saving Crawl Web Results

I am trying to crawl review for a particular movie review from IMDB website. For this I am using crawl web which i have embedded inside loop as there are 74 pages.

Attached are the images of configuration. Please help. Am badly stuck in this.

URL for Crawl Web is: http://www.imdb.com/title/tt0454876/reviews?start=%{pagePos}

Solution

When I tried it, I got 403 forbidden errors because the IMDB service thinks I am a robot. Using Loop with Crawl Web is bad practice because the Loop operator does not implement any waiting.

This process can be reduced to just the Crawl Web operator. The key parameters are:

URL - set this to http://www.imdb.com/title/tt0454876
max pages - set this to 79 or whatever number you need
max page size - set this to 1000
crawling rules - set these to the ones you specified
output dir - choose a folder to store things in

This works because the crawl operator will work out all possible URLs that match the rules and will store those that also match. The visits will be delayed by 1000 ms (the delay parameter) to avoid triggering a robot exclusion at the server.

Hope this gets you going as a start.