Search code examples
c#.netregex.net-3.5

What am I doing wrong with my Regex?


I am not sure what I am doing wrong. I am trying to use the asp.net regex.replace but it keeps replacing the wrong item.

I have 2 replaces. The first one does what I want it to it replaces what I want. The next replace that is almost a mirror image does not replace what I want.

So this is my sample code

<%@ Page Title="Tour" Language="C#" MasterPageFile="~/Views/Shared/Site.Master" Inherits="System.Web.Mvc.ViewPage" %>
<asp:Content ID="Content1" ContentPlaceHolderID="HeadContent" runat="server">
    <title>Website Portfolio Section - VisionWebCS</title>
    <meta name="description" content="A" />
    <meta name="keywords" content="B" />
</asp:Content>
<asp:Content ID="Content2" ContentPlaceHolderID="MainContent" runat="server">
    <!-- **START** -->

I am looking to replace both the meta tags.

<meta name=\"description\" content=\"A\" />
<meta name=\"keywords\" content=\"B\" />

In my code first I replace the keywords meta tag with

<meta name=\"keywords\" content=\"C\" />

This works so my next task is to replace the description meta tag with this

<meta name=\"description\" content=\"D\" />

This does not work instead it replaces the "keywords" meta tag and then replaces the "description" tag.

Here is my test program so you all can try it out. Just through it in C# console app.

  private const string META_DESCRIPTION_REGEX = "<\\s* meta \\s* name=\"description\" \\s* content=\"(?<Description>.*)\" \\s* />";
        private const string META_KEYWORDS_REGEX = "<\\s* meta \\s* name=\"keywords\" \\s* content=\"(?<Keywords>.*)\" \\s* />";
        private static RegexOptions regexOptions = RegexOptions.IgnoreCase
                                   | RegexOptions.Multiline
                                   | RegexOptions.CultureInvariant
                                   | RegexOptions.IgnorePatternWhitespace
                                   | RegexOptions.Compiled;

        static void Main(string[] args)
        {

            string text = "<%@ Page Title=\"Tour\" Language=\"C#\" MasterPageFile=\"~/Views/Shared/Site.Master\" Inherits=\"System.Web.Mvc.ViewPage\" %><asp:Content ID=\"Content1\" ContentPlaceHolderID=\"HeadContent\" runat=\"server\">    <title>Website Portfolio Section - VisionWebCS</title>    <meta name=\"description\" content=\"A\" />    <meta name=\"keywords\" content=\"B\" /></asp:Content><asp:Content ID=\"Content2\" ContentPlaceHolderID=\"MainContent\" runat=\"server\"><!-- **START** -->";
            Regex regex = new Regex(META_KEYWORDS_REGEX, regexOptions);
            string newKeywords = String.Format("<meta name=\"keywords\" content=\"{0}\" />", "C");
            string output = regex.Replace(text, newKeywords);

            Regex regex2 = new Regex(META_DESCRIPTION_REGEX, regexOptions);
            string newDescription = String.Format("<meta name=\"description\" content=\"{0}\" />", "D");
            string newOutput = regex2.Replace(output, newDescription);
            Console.WriteLine(newOutput);
        }

This gets me a final output of

<%@ Page Title="Tour" Language="C#" MasterPageFile="~/Views/Shared/Site.Master"
Inherits="System.Web.Mvc.ViewPage" %>
<asp:Content ID="Content1" ContentPlaceHold erID="HeadContent" runat="server">
    <title>Website Portfolio Section - VisionW
        ebCS</title>
    <meta name="description" content="D" />
</asp:Content>
<asp:Conten t ID="Content2" ContentPlaceHolderID="MainContent" runat="server">
    <!-- **START**
    -->

Thanks


Solution

  • To answer your question without useless life lessons, you are having troubles because of greedy quantifiers. Try making them lazy by adding question marks:

    <meta\\s+?name=\"description\"\\s+?content=\"(?<Description>.*?)\"\\s*?/>
    

    Sure this regex won't work for all pages in the world, but if you need just make some quick replacement script for your own templates, regex is the fastest and easiest solution and the way to go.