I have been trying to find all occurrences of a substring in a given string, and replace a specific occurrence with another substring (the condition is not important for the question). What I need is to find all occurrences (even overlapping ones) and to be able to easily replace a specific one I choose.
The issue is that if I don't use lookahead I can't find overlapping occurrences (e.g. find "aa" in "aaa" will only find the first "aa" sequence because the second one overlaps with the first one):
var regex = new Regex(Regex.Escape("aa"));
regex.Matches("aaa").Count;
Value of the second line: 1 Expected: 2
If I use a lookahead I find all of the occurrences but the replacement doesn't work (e.g. replace "a" in "a" with "b", will result in "ba" instead of "b"):
var regex = new Regex(Regex.Escape("(?=a)"));
regex.Replace("a", "b");
Replace result: ba Expected: b
Those are, of course, simple examples that showcase the issues in an easy way, but I need this to work on any example. I know that I can easily do a search for both, or manually go over the word, but this code snippet is going to run many times and needs to both be efficient and readable.
Any ideas / tips on finding overlapping occurrences while still being able to replace properly? Should I even be using regex?
I think I would forgo regex and write a simple loop as below (there is room for improvement), because I think it would be quicker and more understandable.
public IEnumerable<int> FindStartingOccurrences(string input, string pattern)
{
var occurrences = new List<int>();
for (int i=0; i<input.Length; i++)
{
if (input.Length+1 > i+pattern.Length)
{
if (input.Substring(i, pattern.Length) == pattern)
{
occurrences.Add(i);
}
}
}
return occurrences;
}
and then call like:
var occurrences = FindStartingOccurrences("aaabbaaaaaccaadaaa", "aa");