Search code examples
assemblycompiler-construction

Writing a compiler with Assembly?


I develop a programming language used for a domain-specific problem, and I wanted to ask a question about compiler building: if you create a compiler that intends to generate straight machine code, then do you need to study Assembly in order to implement such a compiler?

If no, what are the alternatives how said compiler can produce binary executables?


Solution

  • If performance is important, you probably don't want to try to generate assembly yourself, unless your domain-specific problem is very simple and specific. Generating efficient asm is much harder than just generating working asm. In a compiler like GCC, optimization passes are more than half the code-base, more than parsing C or even C++.

    Generate something that an existing optimizer like LLVM can deal with, like LLVM-IR. Write a portable front-end for your language, leave the target-specific stuff and optimization to LLVM, or to GCC's back-end. https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index.html has a tutorial.

    Of course, to debug your compiler, you may want to learn some assembly to at least know where to start looking in the IR for wrong-code bugs. And of course you'd have to learn LLVM-IR, which is essentially an assembly language.


    Or compiling to C is an old-school technique but still works: optimizing C compilers are widely available. (Historically well know CFortran, and C++ was originally implemented with CFront which compiled it to C.)

    Depending on your domain-specific problem, you might choose to compile to some other high-level language that matches your problem domain. Pick a language that you can target easily, and that has a good optimizing compiler or JIT run-time. e.g. Julia is reputedly good for number-crunching, I think letting you take advantage of parallelism.

    C++ could be a good target if some of its template library functions work well. Ahead-of-time C++ compilers will make an executable that just depends on some libraries, not a "runtime" like a JVM or something. And can compile to a library you can easily call from most other things: C and C++ foreign-function interfaces are common in most other language. Depending on your use-case, this may be important.

    This method will let you use a C, C++, Julia, or whatever debugger to see what the code your compiler generated is doing. So you only need to know that target language.

    Understanding assembly concepts can be useful to understand what C undefined behaviour might produce the symptoms you're seeing, in case of compiler bugs like out-of-bounds array access. But with modern tools like clang -fsanitize=undefined, you can check for many such problems to help verify your compiler.


    Also related: Learning to write a compiler