Search code examples
.netregextokenizeevaluation

tokenizing mathematical equation using regex


I'm trying to split up an equation string into tokens. Ive found a good starting point '([A-Za-z]+|[0-9.]+|[&=><\|!]+|\S)'. However this has trouble with negative numbers:

turns: '5--4=sin(2+3)'
into: ['5','-','-','4','=','sin','(','2','+','3',')']
want: ['5','-','-4','=','sin','(','2','+','3',')']

and also

turns: -3+3
into: ['-','3','+','3']
want: ['-3','+','3']

It looks like a my regex could use something that checks if there is a number to the left of the '-' if not keep it with the next number(note '-3' has nothing to the left). Can it be done using regex? Or is there a better tool to split this up in .NET?


Solution

  • Regex is not powerful enough to do what you want in all contexts. Although you can make regex recognize + or - as part of an integer literal, for example, by adding an optional [+-]? in front of a digit sequence, the resultant regex would opt to tokenize '-3+3' as ['-3', '+3'] (demo).

    Using a lexer generator should fix this problem; alternatively, you can deal with "bundling" unary operators with their operands in the parser.