Search code examples
.net-4.0string-comparison

When should Regex be used over String.IndexOf()? or String.Contains()?


I'm, currently working my first project in .NET 4.0 and it requires several thousand string comparisons (I'm searching directories and sometimes entire drives for certain files). For the most part, the strings are quite short because I'm only looking at file paths so I have just made use of String.Contains() to see if the file path string contains my needle string.

I was wondering though, would Regex be a better idea? At what point will the Regex be faster than a standard string comparison? Is it based on the length of the strings being compared or the number of strings being compared?


Solution

  • It's variable. Comparison performance is a complex function of the input data, the culture being used for comparing, case sensitivity and CompareOptions. A Regex object is more expensive to instantiate (unless it's in the Regex cache), so if you're doing a lot of one off comparisons, it not that great to use and I've found it's typically slower than IndexOf(), but YMMV.

    Keep in mind that when using Contains/IndexOf that the culture under which the user/thread is running will decide how the comparison is done. That can have a significant impact on performance. Not all cultures are as fast.

    The Invariant culture is a very fast culture. If you use a CompareInfo directly, rather than doing String.IndexOf(), it will be somewhat faster still.

    CultureInfo.InvariantCulture.CompareInfo.IndexOf(..)
    

    The only way to have some confidence in making the right choice is to benchmark. That said, unless you're shifting through many megabytes of strings, it won't make a difference that matters to anyone. As ChrisF said earlier, focus on readable/maintainble code in that case.

    Here's a good article on getting the most out of regex: Optimizing Regular Expression Performance