Search code examples
c#regexedi

How to split string by another string


I have this string (it's from EDI data):

ISA*ESA?ISA*ESA?

The * indicates it could be any character and can be of any length.

? indicates any single character.

Only the ISA and ESA are guaranteed not to change.

I need this split into two strings which could look like this: "ISA~this is date~ESA|" and

"ISA~this is more data~ESA|"

How do I do this in c#?

I can't use string.split, because it doesn't really have a delimeter.


Solution

  • You can use Regex.Split for accomplishing this

    string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
    
    var regex = new Regex($@"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
    var items = regex.Split(inputStr);
    
    foreach (var item in items) {
        Console.WriteLine(item);
    }
    

    Output:

    ISA~this is date~ESA
    ISA~this is more data~ESA|
    

    Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.

    To explain the Regex a bit:

    (?<=ESA)   Look-behind assertion. This portion is not captured but still matched
    (?=ISA)    Look-ahead assertion. This portion is not captured but still matched
    

    Using these look-around assertions you can find the correct | character for splitting