Search code examples
cudanvcc

How to specify alignment for global device variables in CUDA


I would like to declare the alignment for a global device variable in CUDA. Specifically, I have a string declaration, like __device__ char str1 = "some pre-defined string"; In normal gcc, I can request alignment from the compiler as __device__ char str1 __attribute__ ((aligned (4))) = "some pre-defined string";

However, when I tried this on nvcc, the compiler ignores these requests. The reason I would like to do this is to copy these strings onto a buffer in my kernels, and copying words at a time is much faster than copying bytes at a time, though they require that the src string be aligned. Can anyone please tell me how to request alignment from the nvcc compiler?


Solution

  • See section 5.3.2 "Size and Alignment Requirement" of the "CUDA C Programming Guide", which can be found here:

    The alignment requirement is automatically fulfilled for the built-in types of char, short, int, long, longlong, float, double like float2 or float4.

    For structs, the size and alignment requirements can be enforced by the compiler using the alignment specifiers __align__(8) or __align__(16).

    Example usage:

    struct __align__(8) { 
        float r; 
        float i;
    } complex_num;