Search code examples
c#xmlregexstring

String Replace: Ignore Whitespace


The issue that I am having is the following.

The root cause of my issue is "XML" parsing (XML is in quotes, because in this case, it is not directly XML) and whitespace.

I need to be able to convert this:

 "This is a <tag>string</tag>"

into

 "This is a {0}"

It must be able to handle nested tags, and that sort of thing. My plan was to use the following to get my replacement text.

 var v = XDocument.Parse(string.Format("<root>{0}</root>", myString),LoadOptions.PreserveWhitespace);
 var ns = v.DescendantNodes();
 var n = "" + ns.OfType<XElement>().First(node => node.Name != "root");

That code returns the first pair of matching tags. It can handle nesting, etc. The only real issue is that even with the "PreserveWhitespace" option, carriage returns are getting eliminated. "\r\n" is converted to just "\n". This prevents a match, so:

 myString = myString.Replace(n,"{0}");

does not work as expected. So I am trying to come up with a way to get the replacement to work properly, ignoring whitespace, but I don't know how to begin... Thoughts?


Solution

  • Input:

    string s = "This <tag id=\"1\">string <inner><tag></tag></inner></tag> is <p>inside <b>of</b> another</p> string";
    

    C# code:

    Match m;
    do
    {
      m = Regex.Match(s, @"\A([\s\S]*)(<(\S+)[^[<>]*>[^<>]*</\3>)([\s\S]*)\Z");
      if (m.Success) {
        s = m.Groups[1].Value + "{0}" + m.Groups[4].Value;
        System.Console.WriteLine("Match: " + m.Groups[2].Value);
      }
    } while (m.Success);
    System.Console.WriteLine("Result: " + s);
    

    Output:

    Match: <b>of</b>
    Match: <p>inside {0} another</p>
    Match: <tag></tag>
    Match: <inner>{0}</inner>
    Match: <tag id="1">string {0}</tag>
    Result: This {0} is {0} string
    

    Test this code here.