executables with nvcc are larger than with gcc/g++ and OpenCL

This is just something that I've noticed and was curious if there was a reason for it.

Compiling some standard helloworld code with Nvidia's nvcc compiler from it's cuda 7.0 toolkit on Ubuntu 14.04 results in an executable of the following size:

liang@liang-EX58-UD3R:~/Documents/cuda-test$ nvcc cudahello.cu -o cudahello
liang@liang-EX58-UD3R:~/Documents/cuda-test$ ls -lah cudahello
-rwxrwxr-x 1 liang liang 508K Jun 25 12:08 cudahello

The program is just a simple hello world program, with no kernel calls:

//cudahello.cu
#include <iostream>

int main(){
    std::cout << "helloworld\n";
    return 0;
}

On the otherhand, an OpenCL is more the expected size for a C++ executable:

liang@liang-EX58-UD3R:~/Documents/opencl-test$ g++ -Wall -std=c++11 oclhello.cpp -lOpenCL -o oclhello
liang@liang-EX58-UD3R:~/Documents/opencl-test$ ls -lah oclhello
-rwxrwxr-x 1 liang liang 8.9K Jun 25 12:08 oclhello

This is also a simple helloworld program:

//oclhello.cpp
#include <CL/cl.h>
#include <iostream>

int main(){
    std::cout << "helloworld";
    return 0;
}

Was there a reason for the CUDA executable being considerably larger? I've found that even with OpenCL functions being used in a C/C++ program, the executable doesn't grow to the size of CUDA executables.

Solution

The primary difference is that in your CUDA case, you are statically linking to libcudart, the cuda runtime library, which adds ~500K minimum to the executable size.

The openCL executable is dynamically linked to libOpenCL.so, which means the size of that library does not contribute to the size of the executable.

To achieve approximate parity, link your cuda application with the additional switch:

--cudart shared

which will force dynamic linking to libcudart, and the CUDA executable size will drop down a lot.

You can also observe the linking difference using ldd.