Search code examples
c#.netregexsubstringcapture

Capture substring within delimiters and excluding characters using regex


How could a regex pattern look like to capture a substring between 2 delimiters, but excluding some characters (if any) after first delimiter and before last delimiter (if any)? The input string looks for instance like this:

var input = @"Not relevant {

#AddInfoStart Comment:String:=""This is a comment"";

AdditionalInfo:String:=""This is some additional info"" ;

# } also not relevant";

The capture should contain the substring between "{" and "}", but excluding any spaces, newlines and "#AddInfoStart" string after start delimiter "{" (just if any of them present), and also excluding any spaces, newlines and ";" and "#" characters before end delimiter "}" (also if any of them present).

The captured string should look like this

Comment:String:=""This is a comment"";

AdditionalInfo:String:=""This is some additional info""

It is possible that there are blanks before or after the ":" and ":=" internal delimiters, and also that the value after ":=" is not always marked as a string, for instance something like:

{  Val1 : Real := 1.7  }

For arrays is used the following syntax:

arr1 : ARRAY [1..5] OF INT := [2,5,44,555,11];
arr2 : ARRAY [1..3] OF REAL

Solution

  • This is my solution:

    1. Remove the content outside the brackets
    2. Use a regular expression to get the values inside the brackets

    Code:

    var input = @"Not relevant {
    
    #AddInfoStart Comment:String:=""This is a comment"";
    
                Val1 : Real := 1.7
    
    AdditionalInfo:String:=""This is some additional info"" ;
    
    # } also not relevant";
    
    // remove content outside brackets
    input = Regex.Replace(input, @".*\{", string.Empty);
    input = Regex.Replace(input, @"\}.*", string.Empty);
    
    string property = @"(\w+)"; 
    string separator = @"\s*:\s*"; // ":" with or without whitespace
    string type = @"(\w+)"; 
    string equals = @"\s*:=\s*"; // ":=" with or without whitespace
    string text = @"""?(.*?)"""; // value between ""
    string number = @"(\d+(\.\d+)?)"; // number like 123 or with a . separator such as 1.45
    string value = $"({text}|{number})"; // value can be a string or number
    string pattern = $"{property}{separator}{type}{equals}{value}";
    
    var result = Regex.Matches(input, pattern)
                      .Cast<Match>()
                      .Select(match => new
                      {
                          FullMatch = match.Groups[0].Value, // full match is always the 1st group
                          Property = match.Groups[1].Value, 
                          Type = match.Groups[2].Value, 
                          Value = match.Groups[3].Value 
                      })
                      .ToList();