Search code examples
c#regextestcomplete

Regex not returning what I ask it to select


I have a string like below:

MSH|^~\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|QPC0amoCwk+2uSHidYKB+Q|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^
MSA|AA|1234

I want to use regex to replace everything between K11| and |P. The string between these changes. I thought this was straight forward enough but I cant get it to work.

I have tried var regEx5 = /K11\|\w*\|P/g then using that string to replace the text. The regex is bringing back QPC0amoCHidY though. I cant understand why it is doing this. Is it because the string contains + symbol? Im at a loss.

Also tried /K11\|[^|]*\|P/g and /K11\|(.*?)\|P/g with no joy

Code that is doing the regex and the replace:

var regEx5 = /K11\|([^|]+)\|P/g 
newText1 = newText1["replace"](regEx5, "K11|<IGNORE>|P");

Solution

  • To replace a string that occurs between two other strings, a common approach is to capture the two bounding strings and then the replacement expression puts back the two captured strings with the new wanted text in the middle.

    Using the RegEx (K11\|).*(\|P) captures the K11| and the |P in groups 1 and 2. The text between them is matched by the .* but it is not captured.

    The question is not clear on what the replacement should be, so lets assume that it is NewText.

    The replacement expression should then be \1NewText\2 or $1NewText$2 depending on the exact RegEx version being used.

    C# code to perform the change could be as follows. Note that the backslash characters in the strings need to be doubled when putting them the C# strings.

    string source = "MSH|^~\\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|QPC0amoCwk+2uSHidYKB+Q|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^";
    string regex = "(K11\\|).*(\\|P)";
    string replace = "$1NewText$2";
    string output = Regex.Replace(source, regex, replace);
    
    Console.WriteLine($"Was: '{source}'");
    Console.WriteLine($"Now: '{output}'");
    

    The output from this code is:

    Was: 'MSH|^~\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|QPC0amoCwk+2uSHidYKB+Q|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^'
    Now: 'MSH|^~\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|NewText|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^'
    

    A comment on the question states that

    K11\|(.*)\|P still returns QPC0amoCHidY

    Where the text QPC0amoCHidYis part of the string between K11| and |P. In this ReGex the text that is captured is the text the should be replaced, the original K11| and |P are thus lost. I do not know why the rest of the text between the two strings (i.e. the +2uSHidYKB+Q) does not appear, but I suspect that something extra is being done in the code.