So, I want to find strings in a text. The text can contain multiple lines. The strings can be delimited by custom delimiters - this should be parameterized. There can be multiple strings in the text, even in one line. For example: if the delimiter is (three double quatation marks): """
then in this text:
lorem ipsum """findthis""" "but not this" 'nor this' """anotherstringtofind"""
""blabla"" """yet another""""""text to find"""
It should find: findthis, anotherstringtofind, yet another, text to find. (Notice, that the delimiters are not present in the matched strings, although I can remove them using C#, if needed.)
I can do a similar thing, just for one character delimiters:
with regex: "[{0}](([^{0}])*)[{0}]"
Like this:
public static MatchCollection FindString(this string input, char delimeter, RegexOptions regexOptions = RegexOptions.Multiline)
{
var regexString = string.Format("[{0}](([^{0}])*)[{0}]", delimeter);
var rx = new Regex(regexString, regexOptions);
MatchCollection matches = rx.Matches(input);
return matches;
}
I guess, the solution would use look-ahead operators, but I could not figure out how to combine it with something, which has similar effect like [^]
in case of single characters. Is it even possible to "negate" a whole sequence of characters (to not put them into the matches)?
I think this question is similar, but I'm not familiar with Python.
Some clarification: My expectation is to use each and delimiter pair exactly once. So, e.g. this pass should pass:
var inputText = "??abc?? ??def?? ??xyz??";
var matches = inputText.FindString("??", RegexOptions.Singleline);
Assert.Equal(3, matches.Count);
Is it possible to solve this in C# using regex? Thank you in advance!
You can use lazy quantifier instead of negated character class. In you example with """ it should lead to regex like """(.*?)"""
Also, notice that your current attempt incorrectly uses character classes for delimiters, as ["""]
is equivalent to ["]
, and in turn to simple "
. Use your delimiter as is, without any additional wrappers.
But don't forget to escape your delimiter before use in regex. So, that if you have delimiter like []
in regex it should be \[\]
.
Your method would look like this:
public static MatchCollection FindString(string input, string delimiter, RegexOptions regexOptions = RegexOptions.Multiline)
{
string pattern = string.Format("{0}(.*?){0}", Regex.Escape(delimiter));
var rx = new Regex(pattern, regexOptions);
return rx.Matches(input);
}
Is it even possible to "negate" a whole sequence of characters
Yes, it is possible: (?:(?!foo).)+
can be used to match something like this. Or for your example """(?:(?!""").)*"""
. But it would be way worse performance-wise comparing to simple lazy quantifier.