I'm trying to split up an equation string into tokens. Ive found a good starting point '([A-Za-z]+|[0-9.]+|[&=><\|!]+|\S)'. However this has trouble with negative numbers:
turns: '5--4=sin(2+3)'
into: ['5','-','-','4','=','sin','(','2','+','3',')']
want: ['5','-','-4','=','sin','(','2','+','3',')']
and also
turns: -3+3
into: ['-','3','+','3']
want: ['-3','+','3']
It looks like a my regex could use something that checks if there is a number to the left of the '-' if not keep it with the next number(note '-3' has nothing to the left). Can it be done using regex? Or is there a better tool to split this up in .NET?
Regex is not powerful enough to do what you want in all contexts. Although you can make regex recognize +
or -
as part of an integer literal, for example, by adding an optional [+-]?
in front of a digit sequence, the resultant regex would opt to tokenize '-3+3'
as ['-3', '+3']
(demo).
Using a lexer generator should fix this problem; alternatively, you can deal with "bundling" unary operators with their operands in the parser.