Search code examples
assemblycompilationriscvmachine-code

How to compile hexadecimally encoded instructions into an ELF for RISCV simulation?


I have written a random RISCV32I instruction generator, which generates instructions in hexadecimal format, if needed, binary format as well.

For a short example, I have:

8e900d13

00000013

0b700e13

00000013

00000013

d7400f93

Nothing more. How am I supposed to use RISCV toolchain to compile it into an ELF or something that I can run on a RISCV processor during simulation.


Solution

  • One approach is some text processing to turn this into usable input for a standard toolchain, like GAS (GNU binutils) or clang.

    sed 's/^/.word 0x/' foo.hex > foo.s would turn your hex into asm source like .word 0x8e900d13 which you could assemble with standard tools. An assembler reads source lines and assembles bytes into the current section of the output file, and doesn't care whether the source line was li s10, -1815, addi s10, x0, -1815, or .word 0x8e900d13, or the same using .byte.

    With some additional text around it like _start: / .global _start, you could have a source file you could usefully assemble + link with gcc with some options.


    For example, skipping your blank lines (or others that aren't just optional whitespace and alphanumeric characters). sed -E uses Extended regular expressions so + is one-or-more repeats, unlike the default basic regex. \s* is 0 or more whitespace characters. The initial /pattern/ applies the rule only to matching lines, the s/pat/subs/ replaces the start of the line (and any initial whitespace) with .word 0x

    $ sed -E '/^\s*[[:alnum:]]+/  s/^\s*/.word 0x/' foo.hex > foo.s
    $ cat foo.s
    .word 0x8e900d13
    
    .word 0x00000013
    
    .word 0x0b700e13
    
    .word 0x00000013
    
    .word 0x00000013
    
    .word 0xd7400f93
    $ clang -target riscv32 -c foo.s
    $ llvm-objdump -d foo.o
    
    foo.o:  file format elf32-littleriscv
    
    Disassembly of section .text:
    
    00000000 <.text>:
           0: 13 0d 90 8e   li      s10, -1815
           4: 13 00 00 00   nop
           8: 13 0e 70 0b   li      t3, 183
           c: 13 00 00 00   nop
          10: 13 00 00 00   nop
          14: 93 0f 40 d7   li      t6, -652
    

    Of course you could just have your instruction generator output asm source like .word 0x... instead of doing that separately. And pick your favourite text-processing tool for this task; sed is easy for this if you're familiar with old-school Unix tools.

    Presumably you know what your simulator wants in a linked executable so you can sort that out.

    Perhaps echo -e '_start:\n.global _start' > foo.s and have sed append (>> instead of >), or write a sed command that uses an i command to insert extra text on line 0 or 1 or something?