I am trying to write a lark grammar for a dsl, but having trouble with this string interpolation syntax:
" abc " <- normal string
" xyz~{expression}abc " <- string with interpolation
so a ~{ switches from string to expression, and a } terminates that expression. I think this is close:
string : "\"" (string_interp|not_string_interp)* "\""
string_interp: "~{" expression "}"
not_string_interp: /([^~][^{])+/
But the regex will only match even numbers of characters, and if the ~{ straddles an even boundary, it will be missed.
not_string_interp: /(.?|([^~][^{])+)/
This is about as far as I could get, but still seems wrong. Can I use lookaheads? I also want to keep %ignore WS on, as it keeps the noise down massively, so a solution accounting for that would be great!
Thanks
Test cases:
""
"a"
"~{1}"
" ~{1} "
"a bc~{1}c d"
"a b~{1}c d"
I think this does it. Sadly any ~ not followed by { will split the string up, but I can reconstruct them later. I am getting fooled by the equal precedence of rules, and the greediness of regexes.
/[^"~]+/
anything that is not ~ or " (regular string)
"~{" expression "}"
the normal expression
/~(?!{)/
handle ~ without {. Use ?! because we must not consume next char (it could be " or another ~)
from lark import Lark
print (Lark(r"""
string: "\"" string_thing* "\""
string_thing: /[^"~]+/
| "~{" expression "}"
| /~(?!{)/
expression: /[^}]+/
""", start='string', ambiguity="explicit").parse(
# '"a"'
'"a~b{}c}d~{1}g"'
# '"~abc~"'
# '"~{1}~~{1}~~~{1}"'
).pretty())