Search code examples
c#screen-scraping

How to screen-scrape 3 levels deep in C#


I have a url (http://www2.anac.gov.br/aeronaves/cons_rab.asp) where I need to post form data programatically. That is, programatically, I want to select the correct radio button and click the submit button. If you go to the url above, the radio button I need selected is "modelo." Clicking the "ok" button will bring back a form with 20k+ links on it.

I then want to traverse all 20k+ links and scrape the page that the links point to. Finally, I will take the information from the last page and put the data in an Excel spreadsheet.

What would be the best way to get to the third page to scrape the information? I've researched the HTML Agility Pack, HTTPWebRequest and the WebBrowser control, but I'm not sure which one to use.

UPDATE: On the first page, I must select a radio button and then simulate a button click that posts the form back to itself. The resulting page contains the 20K+ links I'm interested in; however, each link is a javascript function call. The JS function takes the link text, places it in a textbox and then clicks the submit button. How the hell do I automate that?


Solution

  • You should be able to do what you want with the HTML Agility pack:

    You should also consider iRobot:

    ALSO:

    1) What have you tried?

    2) How far did you get? What problems/questions did you encounter?