I am working on creating my own language using Rex and Racc, but I have gotten stuck. I am not sure how to add functions, or any kind of code that won't be immediately executed like a lambda. I have added blocks/lambdas into the language, but whatever is in the block is executed immediately. How could I make a block/lambdas which can be run at any time, multiple times and has it's own scope? Or even something like an if statement, where the "block" is only executed if the statement is true?
Here is my code:
lexer.rex:
class MyLang
macro
BLANK [\s]+
VAR [a-zA-Z_]\w*
NUMBER \d+
MULTIPLY \*
DIVIDE \/
ADD \+
SUBTRACT \-
EQUALS =
LEFT_PARENTHESIS \(
RIGHT_PARENTHESIS \)
STRING ("([^"]|\\")*?(?<!\\)")|('([^']|\\')*?(?<!\\)')
CURLY_BRACKET_L {
CURLY_BRACKET_R }
rule
{BLANK}
{VAR} { [:VAR, text.to_sym] }
{NUMBER} { [:NUMBER, text.to_i] }
{MULTIPLY} { [:MULTIPLY, text.to_sym] }
{DIVIDE} { [:DIVIDE, text.to_sym] }
{ADD} { [:ADD, text.to_sym] }
{SUBTRACT} { [:SUBTRACT, text.to_sym] }
{EQUALS} { [:EQUALS, text.to_sym] }
{LEFT_PARENTHESIS} { [:LEFT_PARENTHESIS, text.to_sym] }
{RIGHT_PARENTHESIS} { [:RIGHT_PARENTHESIS, text.to_sym] }
{STRING} { [:STRING, text] }
{CURLY_BRACKET_L} { [:CURLY_BRACKET_L, text.to_sym] }
{CURLY_BRACKET_R} { [:CURLY_BRACKET_R, text.to_sym] }
inner
def tokenize(code)
scan_setup(code)
tokens = []
while token = next_token
tokens << token
end
tokens
end
end
parser.y:
class MyLang
prechigh
left LEFT_PARENTHESIS
left RIGHT_PARENTHESIS
left MULTIPLY
left DIVIDE
left ADD
left SUBTRACT
right EQUALS
preclow
rule
expression : value
| block
value : NUMBER { return Value.new(val[0], "Number") }
| STRING { return Value.new(MyLangCore.str_escape(val[0]), "String") }
| assignment
| value MULTIPLY value { return MyLangCore.binary_operator(val[0], val[2], val[1]) }
| value DIVIDE value { return MyLangCore.binary_operator(val[0], val[2], val[1]) }
| value ADD value { return MyLangCore.binary_operator(val[0], val[2], val[1]) }
| value SUBTRACT value { return MyLangCore.binary_operator(val[0], val[2], val[1]) }
| LEFT_PARENTHESIS value RIGHT_PARENTHESIS { return val[1] }
| VAR { return MyLangCore.get_var(val[0]) }
assignment : VAR EQUALS value { return MyLangCore.new_variable(val[0], val[2]) }
block : CURLY_BRACKET_L expression CURLY_BRACKET_R
end
---- header
require_relative "lexer"
require_relative "my_lang_core"
---- inner
def parse(input)
scan_str(input)
end
This is how I'm evaluating my language:
#!/usr/bin/env ruby
require_relative "parser.rb"
require "minitest/autorun"
# check for errors once they exist
describe MyLang do
before do
@parser = MyLang.new
end
describe "variables" do
it "assigns usable variables" do
@parser.parse("a = 4")
@parser.parse("a").value.must_equal 4
end
it "does complex assignments" do
@parser.parse("a = (4 + 8) * 2")
@parser.parse("b = 2 * (c = a + 1) + 1")
@parser.parse("a").value.must_equal 24
@parser.parse("b").value.must_equal 51
@parser.parse("c").value.must_equal 25
end
it "allows cool variable names" do
@parser.parse("_123 = 74")
@parser.parse("_123").value.must_equal 74
end
end
describe "PEMDAS" do
it "does math" do
@parser.parse("10 + 12 * 3 + 2").value.must_equal 48
end
it "does simple parentheses" do
@parser.parse("(1)").value.must_equal 1
end
it "uses parentheses" do
@parser.parse("(10 + 12) * 3 + 2").value.must_equal 68
end
it "does multi-level parentheses" do
@parser.parse("(3 - (2 - 1)) * 4").value.must_equal 8
end
end
describe "strings" do
it "parses strings" do
@parser.parse(%{'hello world.'}).value.must_equal "hello world."
@parser.parse(%{"hello world."}).value.must_equal "hello world."
@parser.parse(%{''}).value.must_equal ""
end
it "assigns strings" do
@parser.parse("a = 'hey'")
@parser.parse("a").value.must_equal "hey"
end
# TODO: fix
# it "handles escape charectors" do
# @parser.parse(%{"hey\\"\\n"}).value.must_equal "hey\"\n"
# end
it "adds strings" do
@parser.parse(%{"Hello, " + "world" + "!"}).value.must_equal "Hello, world!"
end
it "multiplies strings" do
@parser.parse(%{"1" * 3}).value.must_equal "111"
end
it "adds and multiplies" do
@parser.parse(%{("Na" + "N ") * 3 + "!"}).value.must_equal "NaN NaN NaN !"
end
end
# this is what I need to implement
describe "blocks" do
@parser.parse("{ a, b in a + b }(5, 4)")
end
end
l = MyLang.new
p l.parse("{ 1 + 2 }")
The approach used by the example in Racc's wiki (where the expressions are directly evaluated during parsing) only works for simple expression evaluators. As soon as you add any type of control flow, it stops working - as you've noticed.
The common way to implement interpreters is to let the parser construct some kind of intermediate representation of the source code - typically an abstract syntax tree or some sort of bytecode. After parsing this representation is then executed as a separate step.