Search code examples
c#htmlasp.netdotnet-httpclient

aspNetHidden div not being served depending on client


I am developing a C# app that gets web pages and processes their contents line by line. To do this, I use the HttpClient class, and read the page contents through ReadAsStreamAsync(). Then I read the stream into a line array and iterate over it. So far so good.

However, the HTML that I obtain with this method is not identical to the HTML that I observe if I navigate to the web page using Chrome or Edge and use View Source to get to the HTML. In particular, the __VIEWSTATE and __VIEWSTATEGENERATOR hidden input elements are surrounded by div elements with class="aspNetHidden" when I use the browser, but not when I get the HTML programmatically. This ruins my line tracking logic as there are extra lines in the page as seen by the browser in relation to the page I am getting in code.

EDIT. After some testing, I am confident that the user agent header employed by the client is what determines whether or not the class="aspNetHidden" div is served. When I mimic my browser's user agent ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36 Edg/83.0.478.37"), the div is served; if I use some other agent such as "Test Client", the div is not served.

My question then is, is there any documentation on what user agent strings cause the div to be served and which don't? Also, can I prevent this from happening?

Thanks.


Solution

  • In short, it is not documented/specified in terms of useragents, but browser capabilities.

    Based on the browsers useragent a set of capabilities gets set up.
    These capabilities are configured in .browser configuration files on the webserver.
    For e.g. .NET 4 you find these files in %SystemRoot%\Microsoft.NET\Framework\v4.0.30319\config\browsers,
    e.g. chrome.browser, iphone.browser, etc.

    Such a .browser file contains a tagwriter capability.
    E.g. chrome.browser:

    <browsers>
        <!-- Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/530.1 (KHTML, like Gecko) Chrome/2.0.168.0 Safari/530.1 -->
        <browser id="Chrome" parentID="WebKit">
            <identification>
                <userAgent match="Chrome/(?'version'(?'major'\d+)(\.(?'minor'\d+)?)\w*)" />
            </identification>
    
            <capabilities>
              <capability name="browser"   value="Chrome" />
              <capability name="tagwriter" value="System.Web.UI.HtmlTextWriter" />
    
              <!-- ... -->  
            </capabilities>
        </browser>
    </browsers> 
    

    The tagwriter capability specifies whether a System.Web.UI.HtmlTextWriter or a System.Web.UI.Html32TextWriter will be be instantiated to write the output.

    The default configuration in the Default.browser file, declares tagwriter as:

    <capability name="tagwriter" value="System.Web.UI.Html32TextWriter" />
    

    Also, if the tagwriter capability is missing a Html32TextWriter is being used.
    From the Microsoft reference source:

    internal HtmlTextWriter CreateHtmlTextWriterInternal(TextWriter tw) {
        Type tagWriter = TagWriter;
        if (tagWriter != null) {
            return Page.CreateHtmlTextWriterFromType(tw, tagWriter);
        }
    
        // Fall back to Html 3.2
        return new Html32TextWriter(tw);
    }
    

    The Html32TextWriter declares not to render a div around hidden input fields.
    From the Microsoft reference source:

    internal override bool RenderDivAroundHiddenInputs {
        get {
            return false;
        }
    }
    

    The HtmlTextWriter does return true for RenderDivAroundHiddenInputs, see the Microsoft reference source.

    Some more reading about all this here.


    What you can do.

    If you always want the wrapping div, use one of the wellknown useragents, otherwise use a custom one like the Test Client you are already using.
    If you control the website being requested, you can set up a custom .browser file for your custom useragent ... but I would rather not go that way ...

    When making the request, just set the appropriate User-Agent request header on your HttpClient, e.g.:

    var client = new HttpClient();
    var userAgent = "Test Client"; // Or "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36 Edg/83.0.478.37"
    client.DefaultRequestHeaders.Add("User-Agent", userAgent);