Search code examples
c#regexregular-language

Regular Expression for AppSettings Tags Removing commented tags


I would like to create a regular expression which should give me the only those appsettings tags which are not commented. Following is the test string.

<a key="a" value="b"/><b key="b" value="b"/><!--<c key="c" value="c"/>-->
<d key="d" value="d"/>

I've come up with following regular expression so far.

(?<!<!--)<[^>]*/+>

Her i'm testing the negative lookahead of

<!--

but it's not working. Any idea.


Solution

  • This is another good opportunity to apply the trash can approach: everything we want goes into the 1st capturing group, the rest goes into the overall match and will be completely disregarded.

    A regex that achieves just that could look like this:

    <!--.*?-->|(<\s*\w+[^>]*>)
    

    Explanation:

    • <!--.*?--> the first alternation matches HTML comment blocks and everything between lazy
    • (<\s*\w+[^>]*>) the second matches any simple XHTML tag

    Demo

    So, only if there is a value in $1 we take note of it.

    Sample Code:

    using System;
    using System.Text.RegularExpressions;
    
    public class Example
    {
        public static void Main()
        {
            string pattern = @"<!--.*?-->|(<\s*\w+[^>]*>)";
            string input = @"<a key=""a"" value=""b""/><b key=""b"" value=""b""/><!--<c key=""c"" value=""c""/>-->
    <d key=""d"" value=""d""/>";
            RegexOptions options = RegexOptions.Multiline;
    
            foreach (Match m in Regex.Matches(input, pattern, options))
            {
                if(m.Groups[1].Success)
                    Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
            }
        }
    }
    

    Another pattern that could achieve a similar thing is to make use of negative lookarounds to assert the comment tags:

    (?<!<!--)(<\s*\w+[^>]*>)(?!-->)
    

    Demo 2