Search code examples
c#.netweb-scrapingaggregateexceptionscrapysharp

ScrapySharp Form Submit causing System.AggregateException


I spent hours racking my head as to why this isn't working

I'm trying to use ScrapySharp to scrape websites, right now just trying out sample sites then moving to my actual site.

Every time I do a form.Submit() in my program I get hit with a System.AggregateException (Specified Cast is Invalid)

My code:

using System;
using System.IO;
using System.Linq;
using System.Net;
using HtmlAgilityPack;
using ScrapySharp.Extensions;
using ScrapySharp.Html;
using ScrapySharp.Html.Forms;
using ScrapySharp.Network;

namespace WebScraper
{
    class MainClass
    {
        public static void Main(string[] args)
        {
            ScrapingBrowser browser = new ScrapingBrowser();

            //set UseDefaultCookiesParser as false if a website returns invalid cookies format
            //browser.UseDefaultCookiesParser = false;
            browser.AllowAutoRedirect = true;
            browser.AllowMetaRedirect = true;
            WebPage homePage = browser.NavigateToPage(new Uri("http://the-internet.herokuapp.com/login"));

            PageWebForm form = homePage.FindForm("login");
            form["username"] = "tomsmith";
            form["password"] = "SuperSecretPassword!";
            form.Method = HttpVerb.Get; //I tried both .Post and .Get
            WebPage resultsPage = form.Submit(); //THIS IS WHERE I GET THE ERROR
            Console.WriteLine(resultsPage);

        }
    }
}

My error:

System.AggregateException: One or more errors occurred. (Specified cast is not valid.) ---> System.InvalidCastException: Specified cast is not valid. at ScrapySharp.Network.ScrapingBrowser.CreateRequest (System.Uri url, ScrapySharp.Network.HttpVerb verb) [0x0000b] in <0a639adc663f45108f057c429262c620>:0 at ScrapySharp.Network.ScrapingBrowser.NavigateToPageAsync (System.Uri url, ScrapySharp.Network.HttpVerb verb, System.String data, System.String contentType) [0x00066] in <0a639adc663f45108f057c429262c620>:0 --- End of inner exception stack trace --- at System.Threading.Tasks.Task.ThrowIfExceptional (System.Boolean includeTaskCanceledExceptions) [0x00011] in /Users/builder/jenkins/workspace/build-package-osx-mono/2019-06/external/bockbuild/builds/mono-x64/external/corert/src/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:2027 at System.Threading.Tasks.Task1[TResult].GetResultCore (System.Boolean waitCompletionNotification) [0x0002b] in /Users/builder/jenkins/workspace/build-package-osx-mono/2019-06/external/bockbuild/builds/mono-x64/external/corert/src/System.Private.CoreLib/src/System/Threading/Tasks/Future.cs:496 at System.Threading.Tasks.Task1[TResult].get_Result () [0x00000] in /Users/builder/jenkins/workspace/build-package-osx-mono/2019-06/external/bockbuild/builds/mono-x64/external/corert/src/System.Private.CoreLib/src/System/Threading/Tasks/Future.cs:466 at ScrapySharp.Network.ScrapingBrowser.NavigateToPage (System.Uri url, ScrapySharp.Network.HttpVerb verb, System.String data, System.String contentType) [0x0000b] in <0a639adc663f45108f057c429262c620>:0 at ScrapySharp.Html.Forms.PageWebForm.Submit () [0x00023] in <0a639adc663f45108f057c429262c620>:0 at WebScraper.MainClass.Main (System.String[] args) [0x00065] in /Users/arib/Projects/WebScraper/WebScraper/Program.cs:29

I'm so tired of this error, any and all help is much appreciated.. Thank you


Solution

  • The problem was that when you use form["username"], the result is a string. You want to get the FormField, which you can do using this code:

    WebPage homePage = browser.NavigateToPage(new Uri("http://the-internet.herokuapp.com/login"));
    PageWebForm form = homePage.FindForm("login");
    var formFields = form.FormFields;
    foreach (var field in formFields)
    {
        if (field.Name.Equals("username", StringComparison.OrdinalIgnoreCase))
        {
            field.Value = "tomsmith";
    
        }
        else if (field.Name.Equals("password", StringComparison.OrdinalIgnoreCase))
        {
            field.Value = "SuperSecretPassword!";
    
        }
    }
    
    WebPage resultsPage = form.Submit();
    Console.WriteLine(resultsPage);
    

    Alternatively, you could use Find() to get the FormField:

    var usernameField = form.FormFields.Find(x => x.Name == "username");