Search code examples
compiler-constructionllvminstruction-set

Where is LLVM's interface for describing the instructions of an ISA?


I'm new to LLVM and to compiler development in general, but I've finished reading Engineering a Compiler 3rd Edition and concluded compilers really may be a very fun area for me to specialize in, since I love delving into low level stuff, and just had the chance to join a compiler project. I'm looking for someone who knows LLVM, to point me to either documentation or LLVM source code files that define an ISA's instructions. I need to use LLVM's interface for defining instructions in a project to automatically issue all instructions of an ISA. I know .td files in a particular ISA's backend subdirectory, such as:

llvm-project/llvm/lib/Target/SystemZ/SystemZInstrInfo.td

and its corresponding .h and .cpp files have something to do with it, but it's my first time hearing about the TableGen language, and I don't know how actually relevant it is to LLVM's interface for providing the concrete definition of an ISA's set of instructions. Any pointers and bits of advise are welcome.


Solution

  • You are right, llvm-project/llvm/lib/Target/<Arch>/<Arch>InstrInfo.td file generally holds definition of ISA. Specifically, this file holds definitions of all the instructions in the ISA.

    However, SystemZ file is written by hand and it is a bit difficult to follow because of irregular structure. Check out the corresponding Hexagon files.

    A few words about instrtuction definitions

    The first instruction from the file linked above is:

    def A2_abs : HInst<
    (outs IntRegs:$Rd32),
    (ins IntRegs:$Rs32),
    "$Rd32 = abs($Rs32)",
    tc_d61dfdc3, TypeS_2op>, Enc_5e2823 {
    let Inst{13-5} = 0b000000100;
    let Inst{31-21} = 0b10001100100;
    let hasNewValue = 1;
    let opNewValue = 0;
    let prefersSlot3 = 1;
    }
    
    • A2_abs is instruction name inside llvm
    • HInst tblgen class, which holds all Hexagon instructions, it is defined here.
    • outs - output params
    • ins - input params
    • IntRegs - register class (a set of all registers allowed to be used here), defined here.
    • $Rd32 and $Rs32 - names of paramenters.
    • "$Rd32 = abs($Rs32)" - assembler string, used for assembly printing.
    • tc_d61dfdc3 - instruction itinary class, holds all scheduling constrains, related code is located in llvm-project/llvm/lib/Target/Hexagon/HexagonSchedule*.td.
    • TypeS_2op - instruction subclass, Hexagon-specific.