I am trying to a simple grammar for parsing python like structures, this is what I could come up with for a list/set
list : '[' atom ( ',' atom)* ']'
set : '(' atom ( ',' atom)* ']'
atom : 'a'..'z' | 'A'..'Z'
| '[' list ']'
| '(' set ')'
Note that this is in antlr, I wanted to know about its correctness and any resources that would help me out
I did look at the python' grammar http://docs.python.org/reference/grammar.html but couldn't quite figure out it was handling list of lists or set of lists or list of sets etc..
Any help would appreciated.
couldn't quite figure out it was handling list of lists or set of lists or list of sets etc..
It doesn't distinguish lists from sets or whatever:
atom: ('(' [yield_expr|testlist_comp] ')' |
'[' [listmaker] ']' |
'{' [dictorsetmaker] '}' |
'`' testlist1 '`' |
NAME | NUMBER | STRING+)
The way they handle recursion of the sort you're describing is that listmaker
, dictorsetmaker
etc. ultimately may contain atom
. For example:
listmaker: test ( list_for | (',' test)* [','] )
test: or_test ['if' or_test 'else' test] | lambdef
or_test: and_test ('or' and_test)*
and_test: not_test ('and' not_test)*
not_test: 'not' not_test | comparison
comparison: expr (comp_op expr)*
expr: xor_expr ('|' xor_expr)*
xor_expr: and_expr ('^' and_expr)*
and_expr: shift_expr ('&' shift_expr)*
shift_expr: arith_expr (('<<'|'>>') arith_expr)*
arith_expr: term (('+'|'-') term)*
term: factor (('*'|'/'|'%'|'//') factor)*
factor: ('+'|'-'|'~') factor | power
power: atom trailer* ['**' factor]
There are a lot of intermediates; that's because they need to establish precedence for a bunch of mathematical operators. Then there list_for
, which allows for adding the extra stuff for a list comprehension.
A much more simplified example might look like:
atom: ('[' [list_or_set] ']' |
'{' [list_or_set] '}' |
NAME | NUMBER | STRING+)
list_or_set: atom (',' atom)* [',']
Or if you want the distinction between lists and sets to be made at this level:
atom: list | set | NAME | NUMBER | STRING+
list: '[' atom (',' atom)* [','] ']'
set: '{' atom (',' atom)* [','] '}'