Search code examples
c#string-operations

How To replace an partial unknown string



I Need to replace (or better delete) a string, where I know the beginning and the end.
Some Characters are unknown, also the length of the string.
Of Course I could work with substring and other c# string-operations but isn't there a simple replace Wildcard Option?

mystring.Replace("O(*)", "");

Would be a nice Option.
I know that the string Begins with O( and Ends with ).
It's possible than the String Looks like O(something);QG(anything else)
Here the result should be ;QG(anything else)

Is this possible with a simple replace?
And what About the advanced Option, that he string exists more than one time like here:
O(something);O(someone);QG(anything else)


Solution

  • Take a look at regular expressions.

    The following will meet this case:

    var result = Regex.Replace(originalString, @"O\(.*?\)", "");
    

    What it means:

    • @ - switch off C# interpreting \ as escape, because otherwise the compiler will see our \( and try to replace it with another char like it does for \n becoming a newline (and there is no \( so it's a compiler error). Regex also uses \ as an escape char, so without the @ to get a slash into the string for regex to interpret as a slash to perform a regex escape, it needs a double C# slash, and that can make regex patterns more confusing
    • " start of c# string
    • O\( literal character O followed by literal character ( - brackets have special meaning in regex, so backslash disables special meaning)
    • .*? match zero or more of any character (lazy/pessimistic)
    • \) literal )
    • " end of string

    .*? is a complex thing warrants a bit more explanation:

    In regex . means "match any single character", and * means "zero or more of the previous character". In this way .* means "zero or more of any character".

    So what's the ? for?

    By default regex * is "greedy" - a .* with eat the entire input string and then start working backwards, spitting characters back out, and checking for a match. If you had 2 in succession like you put:

    K(hello);O(mystring);O(otherstring);L(byebye)
    

    And you match it greedily, then O\(.*\) will match the initial O(, then consume all the input, then spit one trailing ) back out and declare it's found a match, so the .* matches mystring);O(otherstring;L(byebye

    We don't want this. Instead we want it to work forwards a character at a time, looking for a matching ). Putting the ? after the * changes from greedy mode to pessimistic(/lazy) mode, and the input is scanned forwards rather than zipping to the end and scanning backwards. This means that O\(.*?) matches mystring and then later otherstring, leaving a result of K(hello);;;L(byebye), rather than K(hello);