Search code examples
assemblyvirtual-machinemachine-code

Toolkits for compiling into a custom machine language


lets say i built an interpreter (more like a virtual machine) capable of running a selection of basic commands. Naturally, i dont want to use a Hex-Editor to build the machine code (which is totally made up and not similar to any other architecture) by hand.

Are there any pre-existing tools for such situations? I was thinking like using some tool to compile some high-level language like C into a basic assembler syntax, but at the same time restrict the compiler to using only a selection of asm commands (like only basic mov, alu commands, push/pop, calls and jump).

Of course one option is to build a whole new compiler for that from scratch, but that obviously sucks and feels like reinventing the wheel. Another option would be to write a script working on the generated asm code, replacing the not-supported commands with others (like splitting lea into mov and arithmetrics), but that would be quite some work with more complicated commands. I would like to cut my own work down to writing an assembler at max, preferable one which only gets a selected subset of commands (so none of these fancy complicated x86 commands like ascii/bcd arithmetics, xchng, string commands or even lea) to simplify things. Would that even be a feasible approach, or is there an easier way to achieve what i want? I am sure i am not the first one doing something like this. Ideally, i would need a compiler where i can describe the target architecture and its capabilities in detail.

Does anyone have done something similar before? I don't even really know where to start, but surely there must be some tools available helping with this.

Edit: To make it clear, i am indeed looking for tools to build bytecode for a self defined ISA. I mentioned C as a high-level language, but that was just an example. I am just looking for a way to program simple snippets for a custom-defined architecture without writing the bytecodes by hand in an hex editor, preferably with a higher-level language. My idea was just, that if i could minimise the instruction set assumed by some standard compiler, i could write some simple script to just translate it into my custom machine code.


Solution

  • You want to use some JIT-compiling library. There are lots of them, at least on Linux: libgccjit, LLVM, libJIT, GNU lightning, asmjit, etc... Both libgccjit and LLVM are capable of fancy optimizations.

    (I first understood that you want to make a new compiler or JIT bytecode interpreter for your existing PC under x86-64)

    Ideally, i would need a compiler where i can describe the target architecture and its capabilities in detail.

    You could be interested by iburg (and also, by some internals in GCC and/or Clang/LLVM).

    If indeed you are inventing a new ISA (perhaps as some low-level bytecode), you could adapt and port GCC to it (write a new machine-description file, etc...). That could take you a few months of work. Ask help on [email protected]. Read the GCC internals documentation. Be aware of GIMPLE.

    If you want a naive, non-optimizing C compiler (or for a C subset) for your new bytecode, you could take inspiration from tinycc which shows that writing a naive C-like compiler from scratch is feasible quite easily (and might take less time than diving into GCC internals). But that compiler won't optimize at all ! See this.

    You also should consider compiling your language to C (and leave low level optimization and code generation to the system C compiler). This is a quite popular approach.

    Notice also that once you have completely specified an ISA, writing an assembler for it is a simple exercise (and once you've got an assembler, you don't need to fiddle with bits in hexadecimal to write some code for your ISA).

    You may be interested by homoiconic or multi-stage programming languages. Look into Lisp -notably Common Lisp and its SBCL implementation- and into MetaOcaml.


    your question is unclear

    (even with the new edit)

    Are you inventing a new bytecode, a new programming language, a new ISA ?

    You need to read SICP and The Dragon Book to at least get the good terminology and concepts (since in its initial form your question is unclear and confusing). You should also be interested by Scott's Programming Language Pragmatics and probably Queinnec's Lisp In Small Pieces.


    My idea was just, that if i could minimize the instruction set assumed by some standard compiler, i could write some simple script to just translate it into my custom machine code.

    That is probably false. Some one instruction set computers have been invented, but in practice these 1-instruction set are not efficient to implement, so minimizing the instruction set to 1 is not a good idea. And using such a "one instruction set ISA" as an intermediate representation (in your compiler) is not a good idea.