Search code examples
c#.netscreen-scrapingapplet

.NET, scrape dynamic (Java App?) webpage for information?


I am attempting to get some information from a website, the info that I need is located on the missouri.edu site (so it's publicly available). Here is the process that I need to accomplish: - Navigate to https://webapps.missouri.edu/ODDSearchEngine/oddsearch - search for a department name like "business" - Click any of the department names, like "Business College, Advancement" - I need to be able to programatically view the source of the page that is output after clicking "Business College, Advancement".

I would like to be able to get the source of each page for each department under business (or whatever department I put in, like "Accounting").

Is this possible with a Windows program? It looks like the "ODDSearchEngine" that runs this is a Java applet. I'm not sure how to interface with it to get the pages.

For reference, if you put the address into my existing program that is output by the ODDSearchEngine it returns the source code of the Search page with 2 "java.lang.NullPointerException" errors.

Is there an easy way to get this information through .Net?


Solution

  • I recently used Watin for a similar task (but it required logging into and tracking a cookie). Watin basically simulates a user visiting a web site. It's probably overkill (and slow) for what you need.

    Another alternative I played around with was HttpWebRequest/Response. This seems like it should satisfy your needs. You can also use HTML Agility Pack to work with the HTML you receive.