Search code examples
c#regexreplacecase-insensitive

What are the things to watch out for with case insensitive regex replace?


I have written the following code to do case insensitive replace in C#:

Regex.Replace(textBoxText, 
    Regex.Escape(findText), 
    replaceText, 
    RegexOptions.IgnoreCase);

Just wanted to check, whether this is the right approach, or is there a better approach and whether I'm overlooking something that I should better be aware of.

Note: Please don't provide me some hand crafted code, I had used a fast replace function from codeproject, and that code crashes at client side, and I have no way to know, what input the user was using. So, I prefer some simple but correct and reliable method.


Solution

  • Your code seems ok, but remember that when you do case-insensitive matching like that, you use the current locale or culture. It is probably better to add the Culture you want, or have the user select it. CultureInvariant is usually a good general choice to act the same in any locale:

    Regex.Replace(textBoxText, 
        Regex.Escape(findText), 
        replaceText, 
        RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
    

    To use another locale, you need to do a bit more hocus pocus:

    // remember current
    CultureInfo originalCulture = Thread.CurrentThread.CurrentCulture;
    
    // set user-selected culture here (in place of "en-US")
    Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");
    
    // do the regex
    Regex.Replace(textBoxText, 
        Regex.Escape(findText), 
        replaceText, 
        RegexOptions.IgnoreCase);
    
    // reset the original culture
    Thread.CurrentThread.CurrentCulture = originalCulture;
    

    Note that you can switch case insensitivity on or off. It is not a toggle, that means that:

    // these three statements are equivalent and yield the same results:
    Regex.Replace("tExT", "[a-z]", "", RegexOptions.IgnoreCase);
    Regex.Replace("tExT", "(?i)[a-z]", "", RegexOptions.IgnoreCase);
    Regex.Replace("tExT", "(?i)[a-z]", "");
    
    // once IgnoreCase is used, this switches it off for the whole expression...
    Regex.Replace("tExT", "(?-i)[a-z]", "", RegexOptions.IgnoreCase);
    
    //...and this can switch it off for only a part of the expression:
    Regex.Replace("tExT", "(?:(?-i)[a-z])", "", RegexOptions.IgnoreCase);
    

    The last one is interesting: between the (?:) after the non-capturing grouping parenthesis, the case-switch (?-i) is not effective anymore. You can use this as often as you like in an expression. Using it without grouping makes them effective until the next case-sensitivity switch, or to the end.

    Update: I made the wrong assumption that you can't do case-sensitivity switching. The text above is edited with this in mind.