Search code examples
rustcompilationllvmllvm-ir

How to compile Rust to LLVM bitcode including dependencies?


I'm working on verifying some Rust code using SAW. SAW requires that you compile to LLVM bitcode, which you can then import and verify. I know you can generate bitcode using the --emit=llvm-bc flag to rustc, and this works great for projects without dependencies.

The issue comes when trying to compile a project which makes use of external crates. Here's an example Cargo.toml file:

[package]
name = "foobar"
version = "0.1.0"
edition = "2018"

[dependencies]
pythagoras = "0.1.1"

And here's a basic src/lib.rs we might want to compile & verify:

pub use pythagoras;

#[no_mangle]
pub extern "C" fn calc_hypot(a: u32, b: u32) -> f64 {
    pythagoras::theorem(a, b)
}

We can compile this to bitcode like this: RUSTFLAGS="--emit=llvm-bc" cargo build --release. The issue is that the bitcode for the current module and its dependencies are generated separately (in target/release/deps/foobar-something.bc and target/release/deps/pythagoras-somethingelse.bc). They're only combined when the actual compiled library is generated.

Is there any way to generate a single bitcode file containing both the current module & all its dependencies, so this file can be imported, and won't refer to any external names? I realise this is a pretty niche case, so hacky solutions (e.g: Compiling to a C static lib, then converting that back to LLVM bitcode somehow) are also completely reasonable.


Solution

  • Expanding on Aiden4s comment:

    • Delete the current target directory to prevent any old artifacts from being used: rm -r target/
    • Compile it with RUSTFLAGS="--emit=llvm-bc" cargo build --release
    • Link the bitcode files together with llvm-link target/release/deps/*.bc > withdeps.bc

    That will get you almost all dependencies. It turns out all Rust programs have an implict dependency on either core or std though (although you can avoid this with the unstable #![no_core], but good luck actually getting anything to compile that way), so you probably want to get the bitcode for that too.

    The easiest way to do that is to compile the standard library from source to bitcode. cargo has experimental support for building the standard libraries from source, so just append -Z build-std --target x86_64-unknown-linux-gnu (and update the target if needed) to your cargo build command. When using --target, which is required by -Z build-std, the build files are put in a target-specific directory, target/x86_64-unknown-linux-gnu/release/deps/ in this case. The targetless directory contains build-dependencies for the standard libraries: we don't want that!

    We don't want to link all of the standard libraries. We really only need std and its dependencies: proc_macro isn't needed here since we are compiling to a binary, not a proc-macro. We also need to link with either proc_abort or panic_unwind, matching it up with the unwind codegen setting we chose. The default is unwinding, so let's delete the other one, proc_abort. Let's send those libraries to the chopping block: rm target/x86_64-unknown-linux-gnu/release/deps/{panic_abort,proc_macro}-*.bc.

    Let's try linking for real this time:

    rm -r target/
    RUSTFLAGS="--emit=llvm-bc" cargo build --release -Z build-std --target x86_64-unknown-linux-gnu
    rm target/x86_64-unknown-linux-gnu/release/deps/{panic_abort,proc_macro}-*.bc
    llvm-link target/x86_64-unknown-linux-gnu/release/deps/*.bc > withalldeps.bc
    

    Yay, it worked! Well, except for the calls to undefined functions in there that still managed to slip through. __rust_alloc, __rust_dealloc, __rust_realloc, and __rust_alloc_zeroed are magic functions that are defined if you use Rust's LLVM fork. The standard library also depends on libpthread and dlsym which are language-asnostic libraries/functions that are usually implemented in C. You can use clang and a libc implementation that supports being compiled with Clang (GNU libc doesn't, I think musl might work here?) to get that if needed. Also if you are compiling to an executable it has trouble finding main from _start.