Search code examples
c#stringsplitstring-parsing

Custom parsing string


When parsing FTX (free text) string, I need to split it using + as a delimiter, but only when it's not preceded by escape character (say, ?). So this string nika ?+ marry = love+sandra ?+ alex = love should be parsed to two strings: nika + marry = love and sandra + alex = love. Using String.Split('+') is obviously not enough. Can I achieve it somehow?

One way, it seems to me, is to replace occurrences of ?+ with some unique character (or a succession of characters), say, @#@, split using "+" as a delimiter and then replace @#@ back to +, but that's unreliable and wrong in any possible way I can think of.

? is used as an escape character only in combination with either : or +, in any other case it's viewed as a regular character.


Solution

  • An horrible regular expression to split it:

    string str = "nika ?+ marry = love??+sandra ???+ alex = love";
    string[] splitted = Regex.Split(str, @"(?<=(?:^|[^?])(?:\?\?)*)\+");
    

    It splits on a + (\+) that is preceded by the beginning of the string (^) or a non-? character ([^?]) plus an even number of ? ((?:\?\?)*). There is a liberal use of the (?:) (non-capturing groups) because Regex.Split does funny things if there are multiple capturing groups.

    Note that I'm not doing the unescape! So in the end ?+ remains ?+.