This question is similar to cuModuleLoadDataEx options but I would like to bring the topic up again and in addition provide more information.
When loading a PTX string with the NV driver via cuModuleLoadDataEx it seems to ignore all options all together. I provide full working examples so that anyone interested can directly and with no effort reproduce this. First a small PTX kernel (save this as small.ptx) then the C++ program that loads the PTX kernel.
.version 3.1
.target sm_20, texmode_independent
.address_size 64
.entry main()
{
ret;
}
main.cc
#include<cstdlib>
#include<iostream>
#include<fstream>
#include<sstream>
#include<string>
#include<map>
#include "cuda.h"
int main(int argc,char *argv[])
{
CUdevice cuDevice;
CUcontext cuContext;
CUfunction func;
CUresult ret;
CUmodule cuModule;
cuInit(0);
std::cout << "trying to get device 0\n";
ret = cuDeviceGet(&cuDevice, 0);
if (ret != CUDA_SUCCESS) { exit(1);}
std::cout << "trying to create a context\n";
ret = cuCtxCreate(&cuContext, 0, cuDevice);
if (ret != CUDA_SUCCESS) { exit(1);}
std::cout << "loading PTX string from file " << argv[1] << "\n";
std::ifstream ptxfile( argv[1] );
std::stringstream buffer;
buffer << ptxfile.rdbuf();
ptxfile.close();
std::string ptx_kernel = buffer.str();
std::cout << "Loading PTX kernel with driver\n" << ptx_kernel;
const unsigned int jitNumOptions = 3;
CUjit_option *jitOptions = new CUjit_option[jitNumOptions];
void **jitOptVals = new void*[jitNumOptions];
// set up size of compilation log buffer
jitOptions[0] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
int jitLogBufferSize = 1024*1024;
jitOptVals[0] = (void *)&jitLogBufferSize;
// set up pointer to the compilation log buffer
jitOptions[1] = CU_JIT_INFO_LOG_BUFFER;
char *jitLogBuffer = new char[jitLogBufferSize];
jitOptVals[1] = jitLogBuffer;
// set up wall clock time
jitOptions[2] = CU_JIT_WALL_TIME;
float jitTime = -2.0;
jitOptVals[2] = &jitTime;
ret = cuModuleLoadDataEx( &cuModule , ptx_kernel.c_str() , jitNumOptions, jitOptions, (void **)jitOptVals );
if (ret != CUDA_SUCCESS) { exit(1);}
std::cout << "walltime: " << jitTime << "\n";
std::cout << std::string(jitLogBuffer) << "\n";
}
Build (assuming CUDA is installed under /usr/local/cuda, I use CUDA 5.0):
g++ -I/usr/local/cuda/include -L/usr/local/cuda/lib64/ main.cc -o main -lcuda
If someone is able to extract any sensible information from the compilation process that would be great! The documentation of CUDA driver API where cuModuleLoadDataEx is explained (and which options it is supposed to accept) http://docs.nvidia.com/cuda/cuda-driver-api/index.html
If I run this, the log is empty and jitTime
wasn't even touched by the NV driver:
./main small.ptx
trying to get device 0
trying to create a context
loading PTX string from file empty.ptx
Loading PTX kernel with driver
.version 3.1
.target sm_20, texmode_independent
.address_size 64
.entry main()
{
ret;
}
walltime: -2
EDIT:
I managed to get the JIT compile time. However it seems that the driver expects an array of 32bit values as OptVals. Not as stated in the manual as an array of pointers (void *
) which are on my system 64 bits. So, this works:
const unsigned int jitNumOptions = 1;
CUjit_option *jitOptions = new CUjit_option[jitNumOptions];
int *jitOptVals = new int[jitNumOptions];
jitOptions[0] = CU_JIT_WALL_TIME;
// here the call to cuModuleLoadDataEx
std::cout << "walltime: " << (float)jitOptions[0] << "\n";
I believe that it is not possible to do the same with an array of void *
. The following code does not work:
const unsigned int jitNumOptions = 1;
CUjit_option *jitOptions = new CUjit_option[jitNumOptions];
void **jitOptVals = new void*[jitNumOptions];
jitOptions[0] = CU_JIT_WALL_TIME;
// here the call to cuModuleLoadDataEx
// here I also would have a problem casting a 64 bit void * to a float (32 bit)
EDIT
Looking at the JIT compilation time jitOptVals[0]
was misleading. As mentioned in the comments, the JIT compiler caches previous translations and won't update the JIT compile time if it finds a cached compilation. Since I was looking whether this value has changed or not I assumed that the call ignores the options all together. Which it doesn't. It's works fine.
Your jitOptVals
should not contain pointers to your values, instead cast the values to void*
:
// set up size of compilation log buffer
jitOptions[0] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
int jitLogBufferSize = 1024*1024;
jitOptVals[0] = (void *)jitLogBufferSize;
// set up pointer to the compilation log buffer
jitOptions[1] = CU_JIT_INFO_LOG_BUFFER;
char *jitLogBuffer = new char[jitLogBufferSize];
jitOptVals[1] = jitLogBuffer;
// set up wall clock time
jitOptions[2] = CU_JIT_WALL_TIME;
float jitTime = -2.0;
//Keep jitOptVals[2] empty as it only an Output value:
//jitOptVals[2] = (void*)jitTime;
and after cuModuleLoadDataEx
, you get your jitTime
like jitTime = (float)jitOptions[2];