Search code examples
ccompiler-constructionclangllvmbitcode

How to compile and keep "unused" C declarations with clang -emit-llvm


Context

I'm writing a compiler for a language that requires lots of runtime functions. I'm using LLVM as my backend, so the codegen needs types for all those runtime types (functions, structs, etc) and instead of defining all of them manually using the LLVM APIs or handwriting the LLVM IR I'd like to write the headers in C and compile to the bitcode that the compiler can pull in with LLVMParseBitcodeInContext2.

Issue

The issue I'm having is that clang doesn't seem to keep any of the type declarations that aren't used by any any function definitions. Clang has -femit-all-decls which sounds like it's supposed to solve it, but it unfortunately isn't and Googling suggests it's misnamed as it only affects unused definitions, not declarations.

I then thought perhaps if I compile the headers only into .gch files I could pull them in with LLVMParseBitcodeInContext2 the same way (since the docs say they use "the same" bitcode format", however doing so errors with error: Invalid bitcode signature so something must be different. Perhaps the difference is small enough to workaround?

Any suggestions or relatively easy workarounds that can be automated for a complex runtime? I'd also be interested if someone has a totally alternative suggestion on approaching this general use case, keeping in mind I don't want to statically link in the runtime function bodies for every single object file I generate, just the types. I imagine this is something other compilers have needed as well so I wouldn't be surprised if I'm approaching this wrong.


e.g. given this input:

runtime.h

struct Foo {
  int a;
  int b;
};

struct Foo * something_with_foo(struct Foo *foo);

I need a bitcode file with this equivalent IR

runtime.ll

; ...etc...

%struct.Foo = type { i32, i32 }

declare %struct.Foo* @something_with_foo(%struct.Foo*)

; ...etc...

I could write it all by hand, but this would be duplicative as I also need to create C headers for other interop and it'd be ideal not to have to keep them in sync manually. The runtime is rather large. I guess I could also do things the other way around: write the declarations in LLVM IR and generate the C headers.


Someone else asked about this years back, but the proposed solutions are rather hacky and fairly impractical for a runtime of this size and type complexity: Clang - Compiling a C header to LLVM IR/bitcode


Solution

  • So, clang doesn't actually filter out the unused declarations. It defers emitting forward declarations till their first use. Whenever a function is used it checks if it has been emitted already, if not it emits the function declaration.

    You can look at these lines in the clang repo.

    // Forward declarations are emitted lazily on first use.
    if (!FD->doesThisDeclarationHaveABody()) {
      if (!FD->doesDeclarationForceExternallyVisibleDefinition())
        return;
    

    The simple fix here would be to either comment the last two lines or just add && false to the second condition.

    // Forward declarations are emitted lazily on first use.
    if (!FD->doesThisDeclarationHaveABody()) {
      if (!FD->doesDeclarationForceExternallyVisibleDefinition() && false)
        return;
    

    This will cause clang to emit a declaration as soon as it sees it, this might also change the order in which definitions appear in your .ll (or .bc) files. Assuming that is not an issue.

    To make it cleaner you can also add a command line flag --emit-all-declarations and check that here before you continue.