Search code examples
rubylambdalanguage-designrex

Adding lambdas to my programming language


I am working on creating my own language using Rex and Racc, but I have gotten stuck. I am not sure how to add functions, or any kind of code that won't be immediately executed like a lambda. I have added blocks/lambdas into the language, but whatever is in the block is executed immediately. How could I make a block/lambdas which can be run at any time, multiple times and has it's own scope? Or even something like an if statement, where the "block" is only executed if the statement is true?

Here is my code:

lexer.rex:

class MyLang
macro
    BLANK [\s]+
    VAR [a-zA-Z_]\w*
    NUMBER \d+
    MULTIPLY \*
    DIVIDE \/
    ADD \+
    SUBTRACT \-
    EQUALS =
    LEFT_PARENTHESIS \(
    RIGHT_PARENTHESIS \)
    STRING ("([^"]|\\")*?(?<!\\)")|('([^']|\\')*?(?<!\\)')
    CURLY_BRACKET_L {
    CURLY_BRACKET_R }

rule
    {BLANK}  
    {VAR} { [:VAR, text.to_sym] }
    {NUMBER} { [:NUMBER, text.to_i] }
    {MULTIPLY} { [:MULTIPLY, text.to_sym] }
    {DIVIDE} { [:DIVIDE, text.to_sym] }
    {ADD} { [:ADD, text.to_sym] }
    {SUBTRACT} { [:SUBTRACT, text.to_sym] }
    {EQUALS} { [:EQUALS, text.to_sym] }
    {LEFT_PARENTHESIS} { [:LEFT_PARENTHESIS, text.to_sym] }
    {RIGHT_PARENTHESIS} { [:RIGHT_PARENTHESIS, text.to_sym] }
    {STRING} { [:STRING, text] }
    {CURLY_BRACKET_L} { [:CURLY_BRACKET_L, text.to_sym] }
    {CURLY_BRACKET_R} { [:CURLY_BRACKET_R, text.to_sym] }

inner
    def tokenize(code)
        scan_setup(code)
        tokens = []
        while token = next_token
            tokens << token
        end
        tokens
    end

end

parser.y:

class MyLang
prechigh
    left LEFT_PARENTHESIS
    left RIGHT_PARENTHESIS
    left MULTIPLY
    left DIVIDE
    left ADD
    left SUBTRACT
    right EQUALS
preclow

rule
    expression : value
    | block

    value : NUMBER { return Value.new(val[0], "Number") }
    | STRING { return Value.new(MyLangCore.str_escape(val[0]), "String") }
    | assignment
    | value MULTIPLY value { return MyLangCore.binary_operator(val[0], val[2], val[1]) }
    | value DIVIDE value { return MyLangCore.binary_operator(val[0], val[2], val[1]) }
    | value ADD value { return MyLangCore.binary_operator(val[0], val[2], val[1]) }
    | value SUBTRACT value { return MyLangCore.binary_operator(val[0], val[2], val[1]) }
    | LEFT_PARENTHESIS value RIGHT_PARENTHESIS { return val[1] }
    | VAR { return MyLangCore.get_var(val[0]) }

    assignment : VAR EQUALS value { return MyLangCore.new_variable(val[0], val[2]) }

    block : CURLY_BRACKET_L expression CURLY_BRACKET_R

end

---- header
    require_relative "lexer"
    require_relative "my_lang_core"

---- inner
    def parse(input)
        scan_str(input)
    end

This is how I'm evaluating my language:

#!/usr/bin/env ruby

require_relative "parser.rb"
require "minitest/autorun"

# check for errors once they exist
describe MyLang do

    before do
        @parser = MyLang.new
    end

    describe "variables" do
        it "assigns usable variables" do
            @parser.parse("a = 4")
            @parser.parse("a").value.must_equal 4
        end

        it "does complex assignments" do
            @parser.parse("a = (4 + 8) * 2")
            @parser.parse("b = 2 * (c = a + 1) + 1")
            @parser.parse("a").value.must_equal 24
            @parser.parse("b").value.must_equal 51
            @parser.parse("c").value.must_equal 25
        end

        it "allows cool variable names" do
            @parser.parse("_123 = 74")
            @parser.parse("_123").value.must_equal 74
        end
    end

    describe "PEMDAS" do
        it "does math" do
            @parser.parse("10 + 12 * 3 + 2").value.must_equal 48 
        end

        it "does simple parentheses" do
            @parser.parse("(1)").value.must_equal 1
        end

        it "uses parentheses" do
            @parser.parse("(10 + 12) * 3 + 2").value.must_equal 68
        end

        it "does multi-level parentheses" do
            @parser.parse("(3 - (2 - 1)) * 4").value.must_equal 8
        end
    end

    describe "strings" do
        it "parses strings" do
            @parser.parse(%{'hello world.'}).value.must_equal "hello world."
            @parser.parse(%{"hello world."}).value.must_equal "hello world."
            @parser.parse(%{''}).value.must_equal ""
        end

        it "assigns strings" do
            @parser.parse("a = 'hey'")
            @parser.parse("a").value.must_equal "hey"
        end

        # TODO: fix
        # it "handles escape charectors" do
        #   @parser.parse(%{"hey\\"\\n"}).value.must_equal "hey\"\n"
        # end

        it "adds strings" do
            @parser.parse(%{"Hello, " + "world" + "!"}).value.must_equal "Hello, world!"
        end

        it "multiplies strings" do
            @parser.parse(%{"1" * 3}).value.must_equal "111"
        end

        it "adds and multiplies" do
            @parser.parse(%{("Na" + "N ") * 3 + "!"}).value.must_equal "NaN NaN NaN !"
        end
    end

    # this is what I need to implement
    describe "blocks" do
        @parser.parse("{ a, b in a + b }(5, 4)")
    end

end

l = MyLang.new
p l.parse("{ 1 + 2 }")

Solution

  • The approach used by the example in Racc's wiki (where the expressions are directly evaluated during parsing) only works for simple expression evaluators. As soon as you add any type of control flow, it stops working - as you've noticed.

    The common way to implement interpreters is to let the parser construct some kind of intermediate representation of the source code - typically an abstract syntax tree or some sort of bytecode. After parsing this representation is then executed as a separate step.