Search code examples
c#regexstringreplaceregex-lookarounds

Replacing overlapping matches in a string (regex or string operations)


I have been trying to find all occurrences of a substring in a given string, and replace a specific occurrence with another substring (the condition is not important for the question). What I need is to find all occurrences (even overlapping ones) and to be able to easily replace a specific one I choose.

The issue is that if I don't use lookahead I can't find overlapping occurrences (e.g. find "aa" in "aaa" will only find the first "aa" sequence because the second one overlaps with the first one):

var regex = new Regex(Regex.Escape("aa"));
regex.Matches("aaa").Count;

Value of the second line: 1 Expected: 2

If I use a lookahead I find all of the occurrences but the replacement doesn't work (e.g. replace "a" in "a" with "b", will result in "ba" instead of "b"):

var regex = new Regex(Regex.Escape("(?=a)"));
regex.Replace("a", "b");

Replace result: ba Expected: b

Those are, of course, simple examples that showcase the issues in an easy way, but I need this to work on any example. I know that I can easily do a search for both, or manually go over the word, but this code snippet is going to run many times and needs to both be efficient and readable.

Any ideas / tips on finding overlapping occurrences while still being able to replace properly? Should I even be using regex?


Solution

  • I think I would forgo regex and write a simple loop as below (there is room for improvement), because I think it would be quicker and more understandable.

            public IEnumerable<int> FindStartingOccurrences(string input, string pattern)
            {
                var occurrences = new List<int>();
    
                for (int i=0; i<input.Length; i++)
                {
                    if (input.Length+1 > i+pattern.Length)
                    {
                        if (input.Substring(i, pattern.Length) == pattern)
                        {
                            occurrences.Add(i);
                        }
                    }
                }
    
                return occurrences;
            }
    

    and then call like:

    var occurrences = FindStartingOccurrences("aaabbaaaaaccaadaaa", "aa");