I have a Token OR:'OR'; that I use for evaluating a boolean expression( a==b OR a==c) I have another rule for parsing state abbreviation that are in a char list AZ,AK,OR,GA...
What I am finding is that antlr has error on the state list thinking OR should be an or
token rather then
stateName
: CHAR CHAR (','|EOF) ->^(STATE CHAR+)
;
how would I go about resolving this ambiguity?
here are some of the rules I am trying to parse
Here is the gramar that I am using
grammar PointFieldRule;
options
{
//language = 'CSharp3';
output=AST;
ASTLabelType=CommonTree;
}
tokens{
STATE;
}
rule : ifExpression? actionExpression EOF!
;
ifExpression
:'IF'! logicalConditionExpression
;
logicalConditionExpression
: booleanAndConditionExpression ( BigOR^ booleanAndConditionExpression)*
;
booleanAndConditionExpression
: logicalCondition ( BigAND^ logicalCondition )*
;
BigAND : 'and'|'AND';
logicalCondition
: booleanAndCondition ( OR^ booleanAndCondition )*
;
OR:'||';
booleanAndCondition
: evalCondition ( AND^ evalCondition)*
;
AND: '&&';
evalCondition
: FieldID OPERATOR^ (FieldID|STRING)
;
actionExpression
: 'THEN'! (actionMessage | fieldAction | stateAction )
;
actionMessage
: ('DISPLAY_WARNING' | 'DISPLAY_ERROR')^ STRING
;
fieldAction
: ('DISABLE' | 'REQUIRED')^ FieldID ( ','! FieldID )*
;
stateAction
: 'STATE_LICENSE'^ stateName+ //(','! stateName)*
;
FieldID
:'0'..'9'+;
/* item : FIELD
| CHAR CHAR
;
*/
//class csharpTestLexer extends Lexer;
stateName
: CHAR CHAR (','|EOF) ->^(STATE CHAR+)
;
CHAR: ('a'..'z'|'A'..'Z')
;
WS : (' '
| '\t'
| '\n'
| '\r')
{ $channel = HIDDEN; }
//{ $channel = Hidden; }
;
OPERATOR
: '=='
| '!='
| '<='
| '>='
| '<'
| '>'
| 'TD'
| 'FD'
| 'PD'
| 'TY'
| 'LY'
| 'TM'
| 'LM'
| '+(DELTA%)>'
| '-(DELTA%)>'
| '+(DELTA)>'
| '-(DELTA)>'
| 'LIKE'
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
//fragment
BigOR: 'or'|'OR';
The lexer creates tokens independently from the parser. So it doesn't matter if the parser might "need" two CHAR
tokens at a given point, if the lexer "sees" the text "OR"
, it will always create a BigOR
token. There's nothing you can do about that.
In your case, you can simply let stateName
match two CHAR
tokens, or a single OR
token like this:
stateName
: name (','|EOF) ->^(STATE name)
;
name
: CHAR CHAR
| BigOR
;
Parsing input "THEN STATE_LICENSE AL,OR,PA"
will result in the following AST:
Note that the OR
is a single token, unlike the others, whose type are CHAR
and have their chars separated. If you want your OR
node to behave like that too, do something like this:
name
: CHAR CHAR
| BigOR -> CHAR[""+$BigOR.text.charAt(0)] CHAR[""+$BigOR.text.charAt(1)]
;
resulting in:
Or if you want the two separate chars to be concatenated, do:
name
: (CHAR CHAR | BigOR) -> CHAR[$text]
;
resulting in: