Nvidia does not allow the access to the generated LLVM IR in the compilation flow of a GPU kernel written in CUDA C/C++. I would like to know if this is possible if I use Alea GPU? In other words, does Alea GPU compilation procedure allows keeping the generated optimized/unoptimized LLVM IR code?
Yes, you are right, Nvidia doesnot show you the LLVM IR, you can only get the PTX code. While Alea GPU allows you to access LLVM IR in several ways:
You use the workflow-based method to code a GPU module as a template, then you compile the template into an LLVM IR module, then you link the LLVM IRModule, optionally with some other IR modules, into a PTX module. Finally, you load the PTX module into the GPU worker. While you get the LLVM IRModule, you can call its method Dump()
to print the IR code to console. Or you can get the bitcode as byte[]
.
I suggest you read more details here:
The F# would be something like this:
let template = cuda {
// define your kernel functions or other gpu moudle stuff
let! kernel = <@ fun .... @> |> Compiler.DefineKernel
// return an entry pointer for this module, something like the
// main() function for a C program
return Entry(fun program ->
let worker = program.Worker
let kernel = program.Apply kernel
let main() = ....
main ) }
let irModule = Compiler.Compile(template).IRModule
irModule.Dump() // dump the IR code
let ptxModule = Compiler.Link(irModule).PTXModule
ptxModule.Dump()
use program = worker.LoadProgram(ptxModule)
program.Run(...)
If you are using Method-based or Instance-based way to code GPU module, you can add event handler for LLVM IR code generated and PTX generated though Alea.CUDA.Events
. The code in F# will look like:
let desktopFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
let (@@) a b = Path.Combine(a, b)
Events.Instance.IRCode.Add(fun ircode ->
File.WriteAllBytes(desktopFolder @@ "module.ir", ircode))
Events.Instance.PTXCode.Add(fun ptxcode ->
File.WriteAllBytes(desktopFolder @@ "module.ptx", ptxcode))
Finally, there is an undocumented way, to let you directly operate on LLVM IR code to construct the functions. It is done by attribute that implemented some IR building interface. Here is a simple example, that accept the parameter, and print it (in compile-time), and return it back:
[<AttributeUsage(AttributeTargets.Method, AllowMultiple = false)>]
type IdentityAttribute() =
inherit Attribute()
interface ICustomCallBuilder with
member this.Build(ctx, irObject, info, irParams) =
match irObject, irParams with
| None, irParam :: [] ->
// the irParam is of type IRValue, which you
// can get the LLVM native handle, by irParam.LLVM
// Also, you can get the type by irParam.Type, which
// is of type IRType, again, you can get LLVMTypeRef
// handle by irParam.Type.LLVM
// You can optionally construct LLVM instructions here.
printfn "irParam: %A" irParam
Some irParam
| _ -> None
[<Identity>]
let identity(x:'T) : 'T = failwith "this is device function, better not call it from host"