I would like to define one basic grammar such as to start to work with lark
. Here is my M(not)WE.
from lark import Lark
GRAMMAR = r"""
?start: _NL* (day_heading)*
day_heading : "==" _NL day_nb _NL "==" _NL+ (paragraph _NL)*
day_nb : /\d{2}/
paragraph : /[^\n={2}]+/ (_NL+ paragraph)*
_NL : /(\r?\n[\t ]*)+/
"""
parser = Lark(GRAMMAR)
tree = parser.parse("""
==
12
==
Bla, bla
Bli, Bli
Blu, Blu
==
10
==
Blo, blo
""")
print(tree.pretty())
This prints :
start
day_heading
day_nb 12
paragraph
Bla, bla
paragraph
Bli, Bli
paragraph Blu, Blu
day_heading
day_nb 10
paragraph Blo, blo
The tree I want is the following one.
start
day_heading
day_nb 12
paragraph
line Bla, bla
line Bli, Bli
line Blu, Blu
day_heading
day_nb 10
paragraph
line Blo, blo
How can I modify my EBNF?
Here is a possible answer: I have misused a recursive rule in my initial question.
Replacing _NL
by NL
allows to keep the new lines.
from lark import Lark
GRAMMAR = r"""
?start: _NL* (day_heading)*
day_heading : "==" _NL day_nb _NL "==" _NL+ (paragraph)+
day_nb : /\d{2}/
paragraph : (line _NL)+
line : /[^\n={2}]+/
_NL : /(\r?\n[\t ]*)+/
"""
parser = Lark(GRAMMAR)
tree = parser.parse("""
==
12
==
Bla, bla
Bli, Bli
Blu, Blu
==
10
==
Blo, blo
""")
print(tree.pretty())
This produces:
start
day_heading
day_nb 12
paragraph
line Bla, bla
line Bli, Bli
line Blu, Blu
day_heading
day_nb 10
paragraph
line Blo, blo