Search code examples
c#pythonwatinweb-crawlerhtml-parsing

What is the best way to crawl a login based sites?


I've to automate a file download activity from a website (similar to, let's say, yahoomail.com). To reach a page which has this file download link, i've to login, jump from page to page to provide some parameters like dates etc., and finally click on download link.

I am thinking of three approaches:

  1. Using WatIN and develop a windows service that periodically executes some WatiN code to traverse through the page and download the file.

  2. Using AutoIT (no much idea)

  3. Using a simple HTML parsing technique (there are several questions here eg., how to maintain a session after doing a login? how to do a logout after doing it?


Solution

  • Try a Selenium script, automated with Selenium Remote Control.