Search code examples
parsingrebolrebol3

How do you parse 4-bit chunks from binary?


I'm trying to understand how I might parse binary per 4 bits if it is possible.

For example: I have 2-byte codes that need to be parsed to determine which instruction to use

#{1NNN} where the first 4 bits tell where which instruction, and NNN represents a memory location (i.e. #{1033} says jump to memory address #{0033}

It seems to be easy to do this with full bytes, but not with half bytes:

parse #{1022} [#{10} {#22}] 

because #{1} isn't valid binary!

So far, I've used giant switch statements with: #{1033} AND #{F000} = #{1000} in order to process these, but wondering how a more mature reboler might do this.


Solution

  • This is a rather big entry, but it addresses your needs and shows off PARSE a bit.

    This is basically a working, albeit simple VM which uses the memory layout you describe above.

    I set up a simple block of RAM which is an actual program that it executes when I use PARSE with the emulator grammar rule... basically, it increments an address and then jumps to that address, skipping over an NOP.

    it then hits some illegal op and dies.

    REBOL [
        title:  "simple VM using Parse, from scratch, using no external libraries"
        author: "Maxim Olivier-Adlhoch"
        date:    2013-11-15
    ]
    
    ;----
    ; builds a bitset with all low-order bits of a byte set, 
    ; so only the high bits have any weight
    ;----
    quaternary: func [value][
        bs: make bitset! 
        reduce [
            to-char (value * 16)
            '- 
            to-char ((value * 16) + 15)
        ]
    ]
    
    ;------
    ; get the 12 least significant bits of a 16 bit value
    LSB-12: func [address [string! binary!] ][
        as-binary (address AND #{0FFF})
    ]
    
    ;------
    i32-to-binary: func [
        n [integer!] 
        /rev
    ][
        n: load join "#{" [form to-hex to-integer n "}"]
        either rev [head reverse n][n]
    ]
    
    ;------
    ; load value at given address. (doesn't clear the opcode).
    LVAL: func [addr [binary!]][
        to-integer copy/part at RAM ( (to-integer addr) + 1) 2
    ]
    
    
    ;------
    ; implement the opcodes which are executed by the CPU
    JMP: func [addr][
        print ["jumping to " addr]
        continue: at RAM ((to-integer addr) + 1) ; 0 based address but 1 based indexing ;-)
    ]
    
    INC: func [addr][
        print ["increment value at address: " addr]
        new-val: 1 + LVAL addr
        addr: 1 + to-integer addr
        bin-val: at (i32-to-binary new-val) 3
        change at RAM addr bin-val
    ]
    
    DEC: func [addr][
        print ["decrement value at address: " addr]
    ]
    
    NOP: func [addr][
        print "skipping Nop opcode"
    ]
    
    
    
    ;------
    ; build the bitsets to match op codes
    op1: quaternary 1
    op2: quaternary 2
    op3: quaternary 3
    op4: quaternary 4
    
    
    ;------
    ; build up our CPU emulator grammar
    emulator: [ 
        some [
            [
                here:
                [ op1 (op: 'JMP)  | op2 (op: 'INC)  | op3 (op: 'DEC)  | op4 (op: 'NOP)] ; choose op code
                :here 
    
                copy addr 2 skip (addr: LSB-12 addr) ; get unary op data
                continue:
                (do reduce [op addr])
                :continue
            ]
            | 2 skip (
                print ["^/^/^/ERROR:  illegal opcode AT: " to-binary here " offset[" -1 + index? here "]"] ; graceful crash!
            )
        ]
    ]
    
    
    
    ;------
    ; generate a bit of binary RAM for our emulator/VM to run...
    
           0   2   4   6   8    ; note ... don't need comments, Rebol just skips them.
    RAM: #{2002100540FF30015FFF}
    RAM-blowup: { 2 002  1 005  4 0FF  3 001  5 FFF } ; just to make it easier to trace op & data
    
    
    parse/all RAM emulator
    
    
    print  "^/^/Yes that error is on purpose, I added the 5FFF bytes^/in the 'RAM' just to trigger it  :-)^/"
    
    print "notice that it doesn't run the NOP (at address #0006), ^/since we used the JMP opcode to jump over it.^/"
    
    print "also notice that the first instruction is an increment ^/for the address which is jumped (which is misaligned on 'boot')^/"
    
    ask "press enter to continue"
    

    the output is as follows:

    increment value at address:  #{0002}
    jumping to  #{0006}
    decrement value at address:  #{0001}
    
    
    
    ERROR:  illegal opcode AT:  #{5FFF}  offset[ 8 ]
    
    
    Yes that error is on purpose, I added the 5FFF bytes
    in the 'RAM' just to trigger it  :-)
    
    notice that it doesn't run the NOP (at address #0006),
    since we used the JMP opcode to jump over it.
    
    also notice that the first instruction is an increment
    for the address which is jumped (which is misaligned on 'boot')
    
    press enter to continue