I'm working on a Lark-based project where I need to be able to "catch" comments in the code being parsed.
However it doesn't work when using the standard lexer without explicitly specifying the standard lexer.
I have taken the second example from the Lark recipes and modified it to use the default parser and to parse C++-like one-line comments:
import lark
comments = []
grammar = r'''
start: INT*
COMMENT: "//" /[^\n]*/
%import common (INT, WS)
%ignore COMMENT
%ignore WS
'''
# This doesn't work, comments are not appended to the list
# parser = lark.Lark(grammar, lexer_callbacks={'COMMENT': comments.append})
# But this does work
parser = lark.Lark(grammar, lexer='standard', lexer_callbacks={'COMMENT': comments.append})
source = r'''
1 2 3 // hello
// world
4 5 6
'''
parser.parse(source)
print(comments)
If I don't have lexer='standard'
the result is an empty list.
But shouldn't it already be using the 'standard'
lexer when one isn't explicitly specified? Is it a mistake in my code, or a possible bug in Lark?
Further experimentation seems to indicate that it's either the 'dynamic'
or 'dynamic_complete'
being used in the default case (lexer
not specified).
Lark
supports different combinations of parser
and lexer
. Some support lexer_callbacks
, some don't:
parser | lexer | lexer_callbacks |
---|---|---|
lalr | standard | Yes |
lalr | contextual | Yes |
earley | standard | Yes |
earley | dynamic | No |
earley | dynamic_complete | No |
lalr | custom | (Maybe) |
earley | custom | (Maybe) |
lexer="auto"
selects a lexer depending on the parser: For lalr
it selects contextual
, for earley
it selects dynamic
. The default parser is earley
, so without selecting parser
or lexer
, lexer_callbacks
are not supported.
A issue in this regard was already opened and closed again.