Search code examples
parsingrustbrainfuck

Parsing brainf*ck code to tree in Rust


I am trying to write an optimizing brainfuck compiler in Rust. Currently it stores tokens in a flat vector, which works, but I am having trouble changing it to use a syntax tree:

#[derive(Clone, PartialEq, Eq)]
pub enum Token {
    Output,
    Input,
    Loop(Vec<Token>),
    Move(i32),
    Add(i32, i32),
    LoadOut(i32, i32),
}
use Token::*;

pub fn parse(code: &str) -> Vec<Token> {
    let mut alltokens = Vec::new();
    let mut tokens = &mut alltokens;
    let mut tokvecs: Vec<&mut Vec<Token>> = Vec::new();
    for i in code.chars() {
        match i {
            '+' => tokens.push(Add(0, 1)),
            '-' => tokens.push(Add(0, -1)),
            '>' => tokens.push(Move(1)),
            '<' => tokens.push(Move(-1)),
            '[' => {
                tokens.push(Loop(Vec::new()));
                tokvecs.push(&mut tokens);
                if let &mut Loop(mut newtokens) = tokens.last_mut().unwrap() {
                    tokens = &mut newtokens;
                }
            },
            ']' => {
                tokens = tokvecs.pop().unwrap();
            },
            ',' => tokens.push(Input),
            '.' => {
                tokens.push(LoadOut(0, 0));
                tokens.push(Output);
            }
            _ => (),
        };
    }

    alltokens
}

What I am having trouble figuring out is how to handle the [ command. The current implementation in the code is one of several I have tried, all of which have failed. I think it may require use of Rust's Box, but I can't quite understand how that is used.

The branch handling the [ command is probably completely wrong, but I'm not sure how it should be done. It pushes a Loop (a variant of the Token enum) containing a vector to the tokens vector. The problem is to then get a mutable borrow of the vector in that Loop, which the if let statement is supposed to do.

The code fails to compile since newtokens does not outlive the end of the if let block. Is it possible to get a mutable reference to the vector inside Loop, and set tokens to it? If not, what could be done instead?


Solution

  • I've gotten the code to work by making it a recursive function:

    #[derive(Clone, PartialEq, Eq)]
    pub enum Token {
        Output,
        Input,
        Loop(Vec<Token>),
        Move(i32),
        Add(i32, i32),
        LoadOut(i32, i32),
    }
    use Token::*;
    
    pub fn parse(code: &str) -> Vec<Token> {
        _parse(&mut code.chars())
    }
    
    fn _parse(chars: &mut std::str::Chars) -> Vec<Token> {
        let mut tokens = Vec::new();
        while let Some(i) = chars.next() {
            match i {
                '+' => tokens.push(Add(0, 1)),
                '-' => tokens.push(Add(0, -1)),
                '>' => tokens.push(Move(1)),
                '<' => tokens.push(Move(-1)),
                '[' => tokens.push(Loop(_parse(chars))),
                ']' => { break; }
                ',' => tokens.push(Input),
                '.' => {
                    tokens.push(LoadOut(0, 0));
                    tokens.push(Output);
                }
                _ => (),
            };
        }
    
        tokens
    }
    

    It seems to work, and is reasonably simple and elegant (I'd still be interested to see a solution that doesn't use recursion).