tensorflow gpu dynamic-linking tensorflow-serving

How to load customized dynamic libs (*.so) in tensorflow serving (gpu)?

I wrote my own cudaMelloc as follows, which I plan to apply in tensorflow serving (GPU) to trace the cudaMelloc calls via the LD_PRELOAD mechanism (could be used to limit the GPU usage for each tf serving container with proper modification as well).

typedef cudaError_t (*cu_malloc)(void **, size_t);

/* cudaMalloc wrapper function */
cudaError_t cudaMalloc(void **devPtr, size_t size)
{
    //cudaError_t (*cu_malloc)(void **devPtr, size_t size);
    cu_malloc real_cu_malloc = NULL;
    char *error;

    real_cu_malloc = (cu_malloc)dlsym(RTLD_NEXT, "cudaMalloc");
    if ((error = dlerror()) != NULL) {
        fputs(error, stderr);
        exit(1);
    }
    cudaError_t res = real_cu_malloc(devPtr, size);
    printf("cudaMalloc(%d) = %p\n", (int)size, devPtr);
    return res;
}

I compile the above code into a dynamic lib file using the following command:

nvcc --compiler-options "-DRUNTIME -shared -fpic" --cudart=shared -o libmycudaMalloc.so mycudaMalloc.cu -ldl

When applied to a vector_add program compiled with command nvcc -g --cudart=shared -o vector_add_dynamic vector_add.cu, it works well:

root@ubuntu:~# LD_PRELOAD=./libmycudaMalloc.so ./vector_add_dynamic 
cudaMalloc(800000) = 0x7ffe22ce1580
cudaMalloc(800000) = 0x7ffe22ce1588
cudaMalloc(800000) = 0x7ffe22ce1590

But when I apply it to tensorflow serving using the following command, the cudaMelloc calls do not refer to the dynamic lib I wrote.

root@ubuntu:~# LD_PRELOAD=/root/libmycudaMalloc.so ./tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=resnet --model_base_path=/models/resnet

So here's my questions:

Is it because that tensorflow-serving is built in a fully static manner, such that tf-serving refers to the libcudart_static.a instead of libcudart.so?
If so, how could I build tf-serving to enable dynamic linking?

Solution

Is it because that tensorflow-serving is built in a fully static manner, such that tf-serving refers to the libcudart_static.a instead of libcudart.so?

It probably isn't built fully-static. You can see whether it is or not by running:

readelf -d tensorflow_model_server | grep NEEDED

But it probably is linked with libcudart_static.a. You can see whether it is or not with:

readelf -Ws tensorflow_model_server | grep ' cudaMalloc$'

If you see unresolved (U) symbol (as you would for the vector_add_dynamic binary), then LD_PRELOAD should work. But you'll probably see a defined (T or t) symbol instead.

If so, how could I build tf-serving to enable dynamic linking?

Sure: it's open-source. All you have to do is figure out how to build it, then how to build it without libcudart_static.a, and then figure out what (if anything) breaks when you do so.