I have a string like below:
MSH|^~\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|QPC0amoCwk+2uSHidYKB+Q|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^
MSA|AA|1234
I want to use regex to replace everything between K11|
and |P
. The string between these changes.
I thought this was straight forward enough but I cant get it to work.
I have tried var regEx5 = /K11\|\w*\|P/g
then using that string to replace the text. The regex is bringing back QPC0amoCHidY
though. I cant understand why it is doing this. Is it because the string contains +
symbol? Im at a loss.
Also tried /K11\|[^|]*\|P/g
and /K11\|(.*?)\|P/g
with no joy
Code that is doing the regex and the replace:
var regEx5 = /K11\|([^|]+)\|P/g
newText1 = newText1["replace"](regEx5, "K11|<IGNORE>|P");
To replace a string that occurs between two other strings, a common approach is to capture the two bounding strings and then the replacement expression puts back the two captured strings with the new wanted text in the middle.
Using the RegEx (K11\|).*(\|P)
captures the K11|
and the |P
in groups 1 and 2. The text between them is matched by the .*
but it is not captured.
The question is not clear on what the replacement should be, so lets assume that it is NewText
.
The replacement expression should then be \1NewText\2
or $1NewText$2
depending on the exact RegEx version being used.
C# code to perform the change could be as follows. Note that the backslash characters in the strings need to be doubled when putting them the C# strings.
string source = "MSH|^~\\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|QPC0amoCwk+2uSHidYKB+Q|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^";
string regex = "(K11\\|).*(\\|P)";
string replace = "$1NewText$2";
string output = Regex.Replace(source, regex, replace);
Console.WriteLine($"Was: '{source}'");
Console.WriteLine($"Now: '{output}'");
The output from this code is:
Was: 'MSH|^~\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|QPC0amoCwk+2uSHidYKB+Q|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^'
Now: 'MSH|^~\&|dgdgd|MSH6TOMSH4|Instrument|MSH4toMSH6|20230921104820+01:00||RSP^K11^RSP_K11|NewText|P|2.5.1||||||UNICODE UTF-8|||LAB-27R^'
A comment on the question states that
K11\|(.*)\|P
still returnsQPC0amoCHidY
Where the text QPC0amoCHidY
is part of the string between K11|
and |P
. In this ReGex the text that is captured is the text the should be replaced, the original K11|
and |P
are thus lost. I do not know why the rest of the text between the two strings (i.e. the +2uSHidYKB+Q
) does not appear, but I suspect that something extra is being done in the code.