Search code examples
assemblyrustcompiler-constructioncompiler-optimizationpeephole-optimization

Nicer way to pattern match window of assembly instructions for peephole w/ Rust?


So I am trying to implement peephole optimisation where I go from a Vec<LLStackInstruction> -> Vec<LLStackInstruction>, where the returned list is optimised. (LL for low level) Our compiler is modeled as an abstract stack machine and for an init/assign we push RHS onto stack, push addr of LHS on stack and then pop and STR but trying to reduce this into a push RHS and then store into address straight through an address specififer but I get a very ugly matching like so: (and with mutability)

pub fn peephole_optimiser(instrs: Vec<LLStackInstruction>) -> Vec<LLStackInstruction> {
    let mut optimized_instructions: Vec<LLStackInstruction> = Vec::new();

    let mut iter = instrs.iter().peekable();

    while let Some(instruction) = iter.next() {
        let window_iter = iter.clone();
        let window: Vec<_> = [vec![instruction], window_iter.take(10).collect()].concat();
        
        if let Some(LLStackInstruction::Push(Register::FP)) = window.get(0) {
            if let Some(LLStackInstruction::Mov(Condition::None, r1, ArgType::Imm(n))) = window.get(1) {
                if let Some(LLStackInstruction::Push(r2)) = window.get(2) {
                    if let Some(LLStackInstruction::Pop(_)) = window.get(3) {
                        if let Some(LLStackInstruction::Pop(_)) = window.get(4) {
                            if let Some(LLStackInstruction::Sub(..)) = window.get(5) {
                                if let Some(LLStackInstruction::Branch(..)) = window.get(6) {
                                    if let Some(LLStackInstruction::Push(_)) = window.get(7) {
                                        if let Some(LLStackInstruction::Pop(_)) = window.get(8) {
                                            if let Some(LLStackInstruction::Pop(_)) = window.get(9) {
                                                if let Some(LLStackInstruction::Str(..)) = window.get(10) {
                                                    if r1 == r2 {
                                                        optimized_instructions.push(LLStackInstruction::Pop(Register::R4));
                                                        optimized_instructions.push(LLStackInstruction::Str(Register::R4, AddressSpecifier::ImmediateOffset(Register::FP, -n)));
                                                        iter.nth(9);
                                                        continue;
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }

        optimized_instructions.push(instruction.clone());
    }
    optimized_instructions
}

How could I go about making this nicer

Example of what this optimisation does

            // Compact init/assign statements
            // push {fp}
            // mov r1, #88
            // push {r1}
            // pop {r0}
            // pop {r1}
            // subs r0, r1, r0
            // bvs _overflow_err
            // push {r0}
            // pop {r3}
            // pop {r4}
            // str r4, [r3]

            // INTO
            // pop {r4}
            // str r4, [fp, #-88]

Only thing I could think of was if I could express this into multiple steps of smaller windows and do multiple optimisation passes (or after optimising a window, rerun from its start on the result until its unchanged)


Solution

  • As mentioned by Jonas, you can use slice pattern matching for greater ergonomics, and on nightly you can combine it with let-chains to avoid a nested-if.

    The difficulty then, is how to combine windowing with skipping an arbitrary number of instructions on success.

    The bare-bones solution is to use manual indexing:

    pub fn peephole_optimiser(instrs: &[LLStackInstruction]) -> Vec<LLStackInstruction> {
        const WINDOW_SIZE: usize = 11;
    
        type Instr = LLStackInstruction;
    
        if instrs.len() < WINDOW_SIZE{
            optimized.extend_from_slice(instrs);
            return optimized;
        }
    
        let mut optimized = Vec::new();
        let mut index = 0;
    
        while index < instrs.len() - WINDOW_SIZE {
            if let [
                Instr::Push(Register::FP),
                Instr::Mov(Condition::None, r1, ArgType::Imm(_)),
                Instr::Push(r2),
                Instr::Pop(_),
                Instr::Pop(_),
                Instr::Sub(..),
                Instr::Branch(..),
                Instr::Push(..),
                Instr::Pop(..),
                Instr::Pop(..),
                Instr::Str(..)
            ] = &instrs[index..][..WINDOW_SIZE]
            {
                if r1 == r2 {
                    optimized.push(LLStackInstruction::Pop(Register::R4));
                    optimized.push(LLStackInstruction::Str(<>));
    
                    index += WINDOW_SIZE;
                    continue;
                }
            }
            
            optimized.push(window[0].clone());
            index += 1;
        }
    
        optimized
    }
    

    but it's a bit harder to understand the purpose, and you run the risk of forgetting to increase the index before running the loop again... which would lead to an infinite loop.

    slice has a windows functions to iterate over windows, in a for loop though it requires a manual skip:

    pub fn peephole_optimiser(instrs: &[LLStackInstruction]) -> Vec<LLStackInstruction> {
        const WINDOW_SIZE: usize = 11;
    
        type Instr = LLStackInstruction;
    
        let mut optimized = Vec::new();
    
        if instrs.len() < WINDOW_SIZE {
            optimized.extend_from_slice(instrs);
            return optimized;
        }
    
        let mut skip = 0;
    
        for window in instrs.windows(WINDOW_SIZE) {
            if skip > 0 {
                skip -= 1;
                continue;
            }
    
            if let [
                Instr::Push(Register::FP),
                Instr::Mov(Condition::None, r1, ArgType::Imm(_)),
                Instr::Push(r2),
                Instr::Pop(_),
                Instr::Pop(_),
                Instr::Sub(..),
                Instr::Branch(..),
                Instr::Push(..),
                Instr::Pop(..),
                Instr::Pop(..),
                Instr::Str(..)
            ] = window
            {
                if r1 == r2 {
                    optimized.push(Instr::Pop(Register::R4));
                    optimized.push(Instr::Str(<>));
                    
                    skip = 9;
                    continue;
                }
            }
            
            optimized.push(window[0].clone());
        }
    
        optimized
    }
    

    The extra variable is not great, though.

    In the end, maybe a combination of slice::windows with a somewhat more manual iteration mode is the best we can achieve:

    • Single iteration variable, rather than 2 disparate ones.
    • With no way to accidentally run into an infinite loop.

    Especially with extracting the matching into a separate function, I think it's pretty readable:

    pub fn peephole_optimiser(instrs: &[LLStackInstruction]) -> Vec<LLStackInstruction> {
        const WINDOW_SIZE: usize = 11;
    
        type Instr = LLStackInstruction;
    
        fn matches(window: &[Instr]) -> bool {
            assert_eq!(WINDOW_SIZE, window.len());
    
            if let [
                Instr::Push(Register::FP),
                Instr::Mov(Condition::None, r1, ArgType::Imm(_)),
                Instr::Push(r2),
                Instr::Pop(_),
                Instr::Pop(_),
                Instr::Sub(..),
                Instr::Branch(..),
                Instr::Push(..),
                Instr::Pop(..),
                Instr::Str(..)
            ] = window
            {
                r1 == r2
            } else {
                false
            }
        }
    
        let mut optimized = Vec::new();
    
        let mut windows = instrs.windows(WINDOW_SIZE);
    
        while let Some(window) = windows.next() {
            if matches(window) {
                optimized.push(Instr::Pop(Register::R4));
                optimized.push(Instr::Str(<>);
                
                windows.nth(WINDOW_SIZE - 2);
                continue;
            }
            
            optimized.push(window[0].clone());
        }
    
        optimized
    }