Search code examples
rustllvm

Compile Rust to a single, interpretable LLVM `.ll` file


Context

The research group I work with develops a verified LLVM* interpreter. We are currently working on adding support for Rust-generated LLVM.

Compiling a simple hello world program with rustc --emit=llvm-ir produces valid LLVM code; however, the code naturally contains references to functions from the STL, marked as hidden within the generated file.

The need

Because we have an interpreter, we need a single, fully interpretable .ll file, or a set of .ll files that contain define all referenced functions, which we can link within our system.

It seems that because Rust relies heavily on cargo and/or rustc for linking with the STL, generating executables from LLVM produced with --emit=llvm-ir is not a simple and natively supported feature. I would like to know if an elegant solution for this exists.

Current solution, request for improvement

One post proposes a solution for building with dependencies by compiling the STL to LLVM BC and linking with llvm-link. We could then use llvm-dis to get a .ll file. This, however, can lead to undefined references due to how rustc interacts with the LLVM assembler:

Yay, it worked! Well, except for the calls to undefined functions in there that still managed to slip through. __rust_alloc, __rust_dealloc, __rust_realloc, and __rust_alloc_zeroed are magic functions that are defined if you use Rust's LLVM fork. The standard library also depends on libpthread and dlsym which are language-asnostic libraries/functions that are usually implemented in C. You can use clang and a libc implementation that supports being compiled with Clang (GNU libc doesn't, I think musl might work here?) to get that if needed. Also if you are compiling to an executable it has trouble finding main from _start.

We have run into several problems of this nature while replicating this post. Hence, we are seeking a better solution, if one exists. Thanks in advance.

Edit: Posted own answer with build script.

*we interpret a close subset of LLVM.


Solution

  • Found own answer: By tweaking with LTO within the build scripts in the aforementioned post (key was -Clto flag), we can now compile a .rs file to produce a single .ll file and an executable from it. Here's a compile script (slightly more legible than the make we currently use.)

    Extensionality for programs with dependencies is being produced and will be posted here if I remember to.

    #!/bin/bash
    
    set -x
    
    OUTPUT_DIR=`pwd`
    LLVM_HOME=/opt/homebrew/Cellar/llvm/19.1.7
    RUSTUP_TOOLCHAIN_LIB=/Users/omitted/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/aarch64-apple-darwin/lib
    
    # Build main with temporary files preserved and emit LLVM-IR
    # key for in-house linking was Clto. 
    rustc -Clto --emit=llvm-ir main.rs
    
    # Optimizer
    $LLVM_HOME/bin/opt -o main.ll main.ll
    
    # .ll -> .o
    $LLVM_HOME/bin/llc -filetype=obj main.ll
    
    # Complete the linking to executable.
    # Extra flags for removing C++ default libs, but link System, resolv, libc, and math.
    # Also strip dead code so we don't have tons of rust std library code that isn't referenced.
    $LLVM_HOME/bin/clang -m64 -Wl,-dead_strip -nodefaultlibs -lSystem -lresolv -lc main.o -o main