I'm developing a web crawler in .Net C# that works like this.
Step1 Visits main page of the site (let's call this page Main.aspx)
Step2 Use httpwebrequest to get the form page (Let's call this page Form.aspx)
Step3 Post the form to another page and get the results. (Let's call this page Results.aspx)
It's pretty straight forward in terms of web crawling.
The current problem is, I can't access Form.aspx page if I dont set a bunch of cookies before. All of these cookies are javascript generated by Main.aspx.
Whenever i try to directly get the Form.aspx page, i get redirected to the Main page. The code that generates the cookies have more than 20kb and its aboslutelly messy and insane, also it uses a lot of "document." references which would block a simple attempt to use JINT or Javascript.net
So after a lot of research i found out that a headless browser would be what I'm looking for, tried a lot of them, but it seems a lot of complication. I already have a class library project with all my web crawlers in there, i just wanted another dll to make it work. Any suggestions?
I'm trying to be as clear as possible, if you have any doubt, please post on comments before giving negative votes...
Use a .NET binding for PhantomJS, which is a headless webkit browser. You might consider going to a full-blown automation framework like Selenium, which is made for testing.
What you are asking for in not simple, though. You are asking for a lot of abstractions so that you can keep the amount of simplicity in your app that you have now.
If you didn't mind a "head-ful" browser, you could also use the Windows Forms "WebBrowser" control or remote control Internet Explorer through COM.