Search code examples
httpwebrequestscreen-scrapingwebclient

C# stumped with screen scraping issue on aspx page


I'm having some trouble scraping some HTML that I'm getting from a postback on a site. It is an aspx page that I am trying to get the generated HTML from.

I have looked at the cookie data and session data and forum data being sent with Chrome developer tools and I still cannot get the page to respond with the search results despite mimicking almost all of it in my code.

There are 3 dropdowns on the page, 2 of which are pre-populated when you first visit the page. After choosing values for the first 2 (it does a postback every time you select on those two), it will populate values for the 3rd drop down. Once selecting a value in the 3rd drop down, you hit the search button and the results come back in a table below that.

After hitting the search button and getting the results on the screen, I went into developer tools and grabbed all of the values that looked relevant (especially all form values) and captured them in my code, but still no luck. Even captured the big viewstate exactly.

Here is a code sample of many code samples that I've tried. Admittedly, I'm not very familiar with some of these classes and I've been trying different code snippets.

I'm not sure if I'm doing it wrong in my code or if I'm just missing form data or cookies to make it execute the POST and return the correct data. My code currently returns HTML from the page back to the responseInString variable, but the HTML looks like it's the first version of the page (as if you visited it for the first time) with no drop down boxes selected and the 3rd is not populated with any values. So I don't know if my code is actually hitting the code-behind and doing the form POST to make it return data.

Any help would be greatly appreciated. Thank you!

using (var wb = new WebClient())
{
    var data = new NameValueCollection();
    data["_EVENTTARGET"] = "";
    data["_EVENTARGUMENT"] = "";
    data["_LASTFOCUS"] = "";
    data["_VIEWSTATE"] = "(giant viewstate)";
    data["__VIEWSTATEGENERATOR"] = "D86C5D2F";
    //3 more form input/select fields after this with values corresponding to the drop downs.

    wb.Headers.Add(HttpRequestHeader.Cookie,
".ASPXANONYMOUS=(long string);" +
    "ASP.NET_SessionId=(Redacted);" +
    " _gid=GA1.2.1071490528.1676265043;" +
    "LoginToken=(Redacted);" +
    "LoginUserID=(Redacted);" +
    "_ga=GA1.1.1195633641.1675746985;" +
    "_ga_38VTY8CNGZ=GS1.1.1676265043.7.1.1676265065.0.0.0");
    wb.Headers.Add("Sec-Fetch-Dest", "document");
    wb.Headers.Add("Sec-Fetch-Mode", "navigate");
    wb.Headers.Add("Sec-Fetch-Site", "same-origin");
    wb.Headers.Add("Sec-Fetch-User", "?1");
    wb.Headers.Add("Content-Type", "application/x-www-form-urlencoded");

    var response = wb.UploadValues("(the web page url)", "POST", data);
    string responseInString = Encoding.UTF8.GetString(response);

    return responseInString;
}

Solution

  • The problem could be related several things:

    Check that you are sending the correct values for the __EVENTTARGET and __EVENTARGUMENT parameters. They are used in ASP.NET pages to trigger server-side events that may be required to process requests.

    Are you encoding the form data that you are sending? With the application/x-www-form-urlencoded content type you need to URL-encode the values of each form field. Use the C# HttpUtility.UrlEncode method.

    Double check your cookies. Any missing values expected by the server could cause it to fail.

    To debug your problem use a tool to capture your HTTP requests, it will give you information about where your request fails. (you should be able to log your data in Chrome chrome://net-export/)

    Formatted the code so it's a bit easier to read:

    using System;
    using System.Collections.Specialized;
    using System.Net;
    using System.Text;
    
    class Program
    {
      static void Main(string[] args)
      {
        var wc = new WebClient();
        wc.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
        wc.Headers[HttpRequestHeader.Cookie] =
                ".ASPXANONYMOUS=(long string);" +
                "ASP.NET_SessionId=(Redacted);" +
                "_gid=GA1.2.1071490528.1676265043;" +
                "LoginToken=(Redacted);" +
                "LoginUserID=(Redacted);" +
                "_ga=GA1.1.1195633641.1675746985;" +
                "_ga_38VTY8CNGZ=GS1.1.1676265043.7.1.1676265065.0.0.0";
    
        var values = new NameValueCollection
        {
            { "_EVENTTARGET", "" },
            { "_EVENTARGUMENT", "" },
            { "_LASTFOCUS", "" },
            { "__VIEWSTATE", "(giant viewstate)" },
            { "__VIEWSTATEGENERATOR", "D86C5D2F" },
            // Add the remaining form input/select fields with values corresponding to the drop downs.
        };
    
        byte[] responseBytes = wc.UploadValues("(the web page url)", "POST", values);
        string responseInString = Encoding.UTF8.GetString(responseBytes);
        Console.WriteLine(responseInString);
      }
    }