Search code examples
c#regextrim

Trimming a mathematical expression


I've got a Regex: [-]?\d+[-+^/\d] and this line of code:

foreach(Match m in Regex.Matches(s, "[-]?\d+[-+*^/\d]*")){
...
}

m.Value = (for example) 2+4*50

Is there any way to get a string array in the form of {"2", "+", "4", "*", "50"}?


Solution

  • First of all, you've got the wrong regular expression. You have a regex that matches the entire string, but you want to split the string on token boundaries, so you want to recognize tokens.

    Second, don't attempt to solve the problem of whether - is a unary minus or an operator at lex time. That's a parse problem.

    Third, you can use ordinary LINQ operators to turn the match collection into an array of strings.

    Put it all together:

        string s = "10*20+30-40/50";
        var matches = 
          Regex.Matches(s, @"\d+|[-+*/]")
          .Cast<Match>()
          .Select(m => m.Value)
          .ToArray();
    

    This technique only works if your lexical grammar is regular, and many are not. (And even some languages that are technically regular are inconvenient to characterize as a regular expression.) As I noted in a comment: writing a lexer is not hard. Consider just writing a lexer rather than using regular expressions.