Search code examples
.netasp.net-coretag-helpers

ASP.NET Core TagHelper — how to get attribute value of descendant elements?


TL;DR

In my ASP.NET Core library, I'm implementing a TagHelper that targets <form> elements. To operate, it needs to know the effective value of the name attribute of all <input> elements inside its <form> element. How can that be achieved?


What I'm trying to achieve

For a little more context about what I'm trying to achieve with this, I'm implementing a reusable honeypot for detecting spambots. I want the consumers of my library to just set an attribute in their form (like <form use-honeypot="true">) and have the corresponding tag helper appending some bait (visually hidden) input fields to the form. So, bots would be tricked into filling in those fields, while humans won't even see them.

For the bait input fields to be effective, they should have attractive common names (like "email", "phone", or "message") to encourage bots to fill them. The tag helper will choose the appropriate names from an internal list of common names, but it needs to make sure that it doesn't pick a name that is already in use by a legit input field in the form. That's why I need a way to obtain the names of each descendant input/control in the form.

What I already considered

Parsing child content

I'm aware that TagHelpers can get the child content of the element they target, through members of TagHelperOutput. However, as far as I know, these members only return the content as a string. This would require parsing the string, which doesn't seem ideal. It also seems counterproductive, and potentially performance heavy.

Using a specialized tag helper just for reading

I thought of creating another TagHelper that targets the <input> elements (and some other form controls) and has an Order of int.MaxValue, that only reads the name attribute of its element and stores it in the HttpContext. This way, the <form> tag helper could then get those names from the HttpContext.

However, I believe the problem of this approach is that I don't have a guarantee that this "reader" tag helper will ever be executed, because it depends on how/if the consumer uses @addTagHelper to import the tag helper and can be opted-out using the ! tag helper opt-out operator. Also, there could be dynamically generated form fields, that are not targeted by tag helpers.

Some additional observations

  • I would like to avoid depending on third party libraries as much as possible. This is because I wish to have no dependencies on my library, other than the SDK.
  • Note that I'm looking for obtaining the name attribute on descendant elements of the form element, not just its immediate children.
  • Because this is implemented in a library, I should not depend on the consumer's own controller or page implementation — i.e. the consumer should not have to do much to make this work, other than opting-in using something like <form use-honeypot="true">.

Solution

  • ASP.NET does not implement features that matches all of your criteria. Here's the list that I parsed from your question:

    1. No string parsing.
    2. Obtaining "the name attribute on descendant elements of the form element, not just its immediate children".
    3. "I wish to have no dependencies on my library, other than the SDK".
    4. There "could be dynamically generated form fields, that are not targeted by tag helpers".

    The solution that would've worked is described in the following archived posts from 2015 on the Razor subsection of the aspnet project on GitHub.

    Post Title Comments
    Different TagHelpers with the same element name, depending on scope #474 Note the comment:
    "I'm thinking as long as there's a parent (no matter how far up) that fulfills the parent tag then the TagHelper should apply. Thoughts?"

    And then the follow-up comment:
    "That doesn't sound like it's the feature people want. I would think it has to be immediate parent only."
    Hierarchical Parent Tag restriction #4981 This post fits your situation:
    "We're working on a custom Tag Helpers package, and one feature we really miss is a parent tag restriction based on tag hierarchy (not only immediate parent). ... We have made a research and found that it is fairly easy to implement hierarchical parent restriction."

    The resolution to the post:
    "We're closing this issue as there was not much community interest in this ask for quite a while now."

    My own program management spec assessment would've been to change the name of the ParentTag property in the HtmlTargetElement attribute to DescendentOfTag and the implementation to be applied to all nested elements. The following example would be executed for any nested element in the tag <form-x> that had a name attribute.

    [HtmlTargetElement(DescendentOfTag = "form-x", Attributes = "name")]
    

    Comparing existing options

    Solution Pros Failed to meet criteria
    ParentTag Uses built-in methods to exchange data between parent and child tags. Only works for elements that are direct children of a parent. It does not work for nested elements.
    RegEx Simple RegEx pattern to extract attribute value. Requires string parsing that can be brittle for unexpected variations of inner HTML strings.
    HTML Parser Well-established with years of development; active on GitHub; "tolerant of malformed HTML"; once HTML is parsed, more values are easily extracted. Creates a project dependency.

    Sample summary

    Solution Sample
    ParentTag Parent tag helper
    [HtmlTargetElement("form-x")]
    Child tag helper
    [HtmlTargetElement(ParentTag = "form-x")]
    RegEx Regex regex = new(@"(?<=\bname="")[^""]*");
    IEnumerable<string> existingNames = regex.Matches(childContentHtml)
    .Cast<Match>()
    .Select(match => match.Value)
    .AsEnumerable();
    HTML Parser HTML Agility Pack
    IEnumerable<string> existingNames = htmlDocument.DocumentNode
    .SelectNodes("//*/@name")
    .Select(node => node.GetAttributeValue("name", ""));
    .ToList()

    AngleSharp
    IEnumerable<string?> existingNames = document.QuerySelectorAll("*")
    .Select(m => m.GetAttribute("name"))
    .Where(s => s != null);

    Class file with Tag Helper solutions listed in the above table

    (Note: The "hidden" controls in the following class below include a custom attribute data-type="hidden" so that they can displayed for testing purposes. See the image after the 'Razor Page view' heading. Replace the custom data-type attribute to type="hidden".)

    Add the following NuGet packages in your project: AngleSharp and HtmlAgilityPack.

    using AngleSharp.Dom;
    using AngleSharp;
    using HtmlAgilityPack;
    using Microsoft.AspNetCore.Razor.TagHelpers;
    using System.Text.RegularExpressions;
    
    namespace WebApplication1.Helpers
    {
        /* --------------------------------------------------
        * RegEx
        */
        [HtmlTargetElement(Attributes = "use-honeypot-regex")]
        public class HoneyPotTagHelper : TagHelper
        {
            public override async Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
            {
                // Remove the attribute that triggered this Tag Helper
                output.Attributes.Remove(output.Attributes.First(t => t.Name == "use-honeypot-regex"));
    
                TagHelperContent childContent = await output.GetChildContentAsync();
                string childContentHtml = childContent.GetContent();
    
                // https://stackoverflow.com/questions/5526094/regex-to-extract-attribute-values
                Regex regex = new(@"(?<=\bname="")[^""]*");
    
                // https://stackoverflow.com/questions/12730251/convert-result-of-matches-from-regex-into-list-of-string/12730562#12730562
                IEnumerable<string> existingNames = regex.Matches(childContentHtml).Cast<Match>().Select(match => match.Value).AsEnumerable();
    
                IEnumerable<string> knownNames = ["email", "phone", "message"];
    
                // var result = list1.Except(list2);
                // "will give you all items in list1 that are not in list2."
                // https://stackoverflow.com/questions/11418942/linq-find-all-items-in-one-list-that-arent-in-another-list
                IEnumerable<string> unusedNames = knownNames.Except(existingNames, StringComparer.OrdinalIgnoreCase);
    
                // Set a default value if there are no unused entries from the known names list
                string nameStr = unusedNames.Any() ? unusedNames.First() : "default";
    
                // 'data-type' attribute is used for testing purposes. Change to 'type'.
                output.PostContent.SetHtmlContent($"<input data-type=\"hidden\" name=\"{nameStr}\" placeholder=\"{nameStr}\" />");
            }
        }
    
        /* --------------------------------------------------
         * AngleSharp
         * https://github.com/AngleSharp/AngleSharp
         * https://anglesharp.github.io
         * https://github.com/AngleSharp/AngleSharp/issues/199#issuecomment-164123291
         */
        [HtmlTargetElement(Attributes = "use-honeypot-angle")]
        public class HoneyPotAngleTagHelper : TagHelper
        {
            public override async Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
            {
                // Remove the attribute that triggered this Tag Helper
                output.Attributes.Remove(output.Attributes.First(t => t.Name == "use-honeypot-angle"));
    
                TagHelperContent childContent = await output.GetChildContentAsync();
                string childContentHtml = childContent.GetContent();
    
                // Create a new context for evaluating web pages with the default config
                IBrowsingContext browsingContext = BrowsingContext.New(Configuration.Default);
    
                // Create a document from a virtual request / response pattern
                IDocument document = await browsingContext.OpenAsync(req => req.Content(childContentHtml));
    
                IEnumerable<string?> existingNames = document.QuerySelectorAll("*")
                    .Select(m => m.GetAttribute("name"))
                    .Where(s => s != null);
    
                IEnumerable<string> knownNames = ["email", "phone", "message"];
    
                // var result = list1.Except(list2);
                // "will give you all items in list1 that are not in list2."
                // https://stackoverflow.com/questions/11418942/linq-find-all-items-in-one-list-that-arent-in-another-list
                IEnumerable<string?> unusedNames = knownNames.Except(existingNames, StringComparer.OrdinalIgnoreCase);
    
                // Set a default value if there are no unused entries from the known names list
                string? nameStr = unusedNames is not null && unusedNames.Any() ? unusedNames.First() : "default";
    
                // 'data-type' attribute is used for testing purposes. Change to 'type'.
                output.PostContent.SetHtmlContent($"<input data-type=\"hidden\" name=\"{nameStr}\" placeholder=\"{nameStr}\" />");
            }
        }
    
        /* --------------------------------------------------
         * HTML Agility Pack
         * https://github.com/zzzprojects/html-agility-pack
         * https://html-agility-pack.net
         * https://stackoverflow.com/questions/tagged/html-agility-pack
         * https://stackoverflow.com/questions/11526554/get-all-the-divs-ids-on-a-html-page-using-html-agility-pack
         */
    
        [HtmlTargetElement(Attributes = "use-honeypot-agility")]
        public class HoneyPotAgilityTagHelper : TagHelper
        {
            public override async Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
            {
                // Remove the attribute that triggered this Tag Helper
                output.Attributes.Remove(output.Attributes.First(t => t.Name == "use-honeypot-agility"));
    
                TagHelperContent childContent = await output.GetChildContentAsync();
                string childContentHtml = childContent.GetContent();
    
                HtmlDocument htmlDocument = new();
                htmlDocument.LoadHtml(childContentHtml);
    
                IEnumerable<string> existingNames = htmlDocument.DocumentNode
                    .SelectNodes("//*/@name")
                    .Select(node => node.GetAttributeValue("name", ""));
    
                IEnumerable<string> knownNames = ["email", "phone", "message"];
    
                // var result = list1.Except(list2);
                // "will give you all items in list1 that are not in list2."
                // https://stackoverflow.com/questions/11418942/linq-find-all-items-in-one-list-that-arent-in-another-list
                IEnumerable<string> unusedNames = knownNames.Except(existingNames, StringComparer.OrdinalIgnoreCase);
    
                // Set a default value if there are no unused entries from the known names list
                string nameStr = unusedNames.Any() ? unusedNames.First() : "default";
    
                // 'data-type' attribute is used for testing purposes. Change to 'type'.
                output.PostContent.SetHtmlContent($"<input data-type=\"hidden\" name=\"{nameStr}\" placeholder=\"{nameStr}\" />");
            }
        }
    
        /* --------------------------------------------------
        * Parent/Child Tag Helper
        */
        [HtmlTargetElement(ParentTag = "form-x")]
        public class GetDescendantNamesTagHelper : TagHelper
        {
            public override void Init(TagHelperContext context)
            {
                string? valueOfNameAttr = context.AllAttributes.FirstOrDefault(t => t.Name == "name")?.Value.ToString();
                if (string.IsNullOrEmpty(valueOfNameAttr)) { return; }
                NameContext.ExistingNames.Add(valueOfNameAttr);
            }
        }
    
        [HtmlTargetElement("form-x")]
        public class GetDescendantNamesParentTagHelper : TagHelper
        {
            public override async Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
            {
                output.TagName = "form";
    
                // "Only if I called await output.GetChildContentAsync(); inside
                // the parent ProcessAsync method then it awaits the children
                // Init to be run before process method of the parent" – OvadyaShachar Commented Nov 6, 2019 at 4:34
                // https://stackoverflow.com/questions/56625896/tag-helper-order-of-execution/58722955#58722955
                await output.GetChildContentAsync();
    
                IEnumerable<string> knownNames = ["email", "phone", "message"];
    
                // var result = list1.Except(list2);
                // "will give you all items in list1 that are not in list2."
                // https://stackoverflow.com/questions/11418942/linq-find-all-items-in-one-list-that-arent-in-another-list
                IEnumerable<string> unusedNames = knownNames.Except(NameContext.ExistingNames, StringComparer.OrdinalIgnoreCase);
    
                string nameStr = unusedNames.Any() ? unusedNames.First() : "default";
    
                // 'data-type' attribute is used for testing purposes. Change to 'type'.
                output.PostContent.SetHtmlContent($"<input data-type=\"hidden\" name=\"{nameStr}\" placeholder=\"{nameStr}\" />");
    
                // Clear the existing names list for processing the next `form-x` Tag Helper
                NameContext.ExistingNames = [];
            }
        }
    
        public static class NameContext
        {
            public static List<string> ExistingNames { get; set; } = [];
        }
    }
    

    Razor Page view

    @page
    @model WebApplication1.Pages.IndexModel
    @addTagHelper WebApplication1.Helpers.HoneyPotTagHelper, WebApplication1
    @addTagHelper WebApplication1.Helpers.HoneyPotAgilityTagHelper, WebApplication1
    @addTagHelper WebApplication1.Helpers.HoneyPotAngleTagHelper, WebApplication1
    @addTagHelper WebApplication1.Helpers.GetDescendantNamesParentTagHelper, WebApplication1
    @addTagHelper WebApplication1.Helpers.GetDescendantNamesTagHelper, WebApplication1
    
    <style>
        form {
            display: flex;
            grid-gap: 1rem;
            align-items: start;
        }
    
        input[data-type="hidden"] {
            border: 1px solid red;
        }
    
        .textarea-container {
            padding: 1rem;
            border: 1px solid #e1e1e1;
        }
    </style>
    
    <h3>RegEx</h3>
    <form use-honeypot-regex name="myForm">
        <input type="text" name="name" placeholder="Name" />
        <input type="tel" name="email" placeholder="Email" />
        <div>
            <textarea name="message" placeholder="Message">Message</textarea>
        </div>
    </form>
    
    <h3>HTML Agility Pack - HTML Parser</h3>
    <form use-honeypot-agility name="myForm">
        <input type="text" name="name" placeholder="Name" />
        <input type="tel" name="email" placeholder="Email" />
        <div>
            <textarea name="message" placeholder="Message">Message</textarea>
        </div>
    </form>
    
    <h3>Angle Sharp - HTML Parser</h3>
    <form use-honeypot-angle name="myForm">
        <input type="text" name="name" placeholder="Name" />
        <input type="tel" name="phone" placeholder="Phone" />
        <div>
            <textarea name="message" placeholder="Message">Message</textarea>
        </div>
    </form>
    
    <h3>ParentTag - Form with direct child controls</h3>
    <form-x name="myForm-2">
        <input type="text" name="name" placeholder="Name" />
        <input type="tel" name="phone" placeholder="Phone" />
        <textarea name="message" placeholder="Message">Message</textarea>
    </form-x>
    
    <h3>ParentTag - Form with nested child controls</h3>
    <form-x name="myForm">
        <input type="text" name="name" placeholder="Name" />
        <input type="email" name="email" placeholder="Email" />
        <div class="textarea-container">
            <textarea name="message">The name of this text area is not retrieved because this control is within a div, and therefore, is not a direct child of the form.</textarea>
        </div>
    </form-x>
    

    Output sample

    <form name="myForm">
        <input type="text" name="name" placeholder="Name">
        <input type="tel" name="phone" placeholder="Phone">
        <div>
            <textarea name="message" placeholder="Message">A text area.</textarea>
        </div>
        <input type="hidden" name="email">
    </form>
    

    ("Hidden" elements are displayed with a red border in the following image.)

    enter image description here

    Notes

    Using RegEx to extract all values of name attributes

    The RegExt solution uses a RegEx pattern to parse the child HTML output. This SO answer provided the RegEx pattern (?<=\btitle=")[^"]* (modified to search for the name attribute instead of title).

    The RegEx pattern handles nested child elements.

    ParentTag

    The ParentTag solution works only for child elements of a form that are direct descendants of the form tag. It does not work for nested control elements as in the following sample:

    <form-x name="myForm">
        <input type="text" name="name" placeholder="Name" />
        <input type="email" name="email" placeholder="Email" />
        <div>
            <textarea name="message">A text area.</textarea>
        </div>
    </form-x>
    

    The ParentTag approach uses a unique tag name for the form to trigger the Tag Helpers versus triggering the Tag Helper with a custom attribute.

    A unique tag name can trigger a Tag Helper for the parent form element and for each child element. This is enabled by setting the ParentTag parameter of the HtmlTargetElement attribute.

    [HtmlTargetElement(ParentTag = "form-x")]
    

    targets all child elements, regardless of the tag type (i.e. <input>, <textarea>, etc.).

    [HtmlTargetElement("form-x")]
    

    targets the parent form element.

    Parent/child Tag Helpers execution order

    Note the use of await output.GetChildContentAsync();. This is important because it forces the required order of execution Tag Helpers. The child Tag Helper needs to execute first so that a list of existing name attributes can be built. After the Tag Helper for all child elements are executed, the ProcessAsync method on the parent Tag Helper is executed.

    From this source:

    we call output.GetChildContentAsync() which will cause the child tag helpers to execute and set the properties.

    This SO post has a diagram showing the order of execution if output.GetChildContentAsync() is used.

    I used a simple static list property to store existing name attribute values instead of using the Items dictionary off the context object. See this link for more information. The static list made it simple to use LINQ compare existing names to the list of known names.

    3rd-Party HTML Parsers

    The HTML parser from the HTML Agility Pack is "very tolerant with real world malformed HTML" per their description on NuGet.org or on GitHub.com.

    "AngleSharp is a .NET library ... standard DOM features such as querySelector or querySelectorAll work for tree traversal."