Search code examples
c#regexregex-lookaroundsregex-greedyboost-regex

Find specific word in reg ex along with special character


string emailBody = " holla holla testing is for NewFinancial History:\"xyz\"  dsd  NewFinancial History:\"abc\"  NewEBTDI$:\"abc\"  dsds  ";

   emailBody = string.Join(" ", Regex.Split(emailBody.Trim(), @"(?:\r\n|\n|\r)"));
                var keys = Regex.Matches(emailBody, @"\bNew\B(.+?):", RegexOptions.Singleline).OfType<Match>().Select(m => m.Groups[0].Value.Replace(":", "")).Distinct().ToArray();
                foreach (string key in keys)
                {
                    List<string> valueList = new List<string>();
                    string regex = "" + key + ":" + "\"(?<" + GetCleanKey(key) + ">[^\"]*)\"";

                    var matches = Regex.Matches(emailBody, regex, RegexOptions.Singleline);
                    foreach (Match match in matches)
                    {
                        if (match.Success)
                        {
                            string value = match.Groups[GetCleanKey(key)].Value;
                            if (!valueList.Contains(value.Trim()))
                            {
                                valueList.Add(value.Trim());
                            }
                        }
                    }

 public string GetCleanKey(string key)
        {
            return key.Replace(" ", "").Replace("-", "").Replace("#", "").Replace("$", "").Replace("*", "").Replace("!", "").Replace("@", "")
                .Replace("%", "").Replace("^", "").Replace("&", "").Replace("(", "").Replace(")", "").Replace("[", "").Replace("]", "").Replace("?", "")
                .Replace("<", "").Replace(">", "").Replace("'", "").Replace(";", "").Replace("/", "").Replace("\"", "").Replace("+", "").Replace("~", "").Replace("`", "")
                .Replace("{", "").Replace("}", "").Replace("+", "").Replace("|", "");
        }

In my above code I am trying to get the value next to NewEBTDI$: which is "abc".

When I include $ sign in the pattern, it doesn't search the value next to field name.

If the $ is removed and one just specifies NewEBTDI then it searches the values.

I want to search the value along with the $ sign.


Solution

  • The right way of handling characters that have a special meaning in regex, but must be searched as is, is to escape them. You can do this with Regex.Escape. In your case it is the $ sign, which means end of line in regex, if not escaped.

    string regex = "" + Regex.Escape(key) + ":" + "\"(?<" + Regex.Escape(GetCleanKey(key))
                   + ">[^\"]*)\"";
    

    or

    string regex = String.Format("{0}:\"(?<{1}>[^\"]*)\"",
                                 Regex.Escape(key),
                                 Regex.Escape(GetCleanKey(key)));
    

    or with VS 2015, using string interpolation:

    string regex = $"{Regex.Escape(key)}:\"(?<{Regex.Escape(GetCleanKey(key))}>[^\"]*)\"";
    

    (It does look better than that in reality, because the C# editor colors the string parts and the embedded C# expressions differently.)