I would like to declare the alignment for a global device variable in CUDA. Specifically, I have a string declaration, like
__device__ char str1 = "some pre-defined string";
In normal gcc, I can request alignment from the compiler as
__device__ char str1 __attribute__ ((aligned (4))) = "some pre-defined string";
However, when I tried this on nvcc, the compiler ignores these requests. The reason I would like to do this is to copy these strings onto a buffer in my kernels, and copying words at a time is much faster than copying bytes at a time, though they require that the src string be aligned. Can anyone please tell me how to request alignment from the nvcc compiler?
See section 5.3.2 "Size and Alignment Requirement" of the "CUDA C Programming Guide", which can be found here:
The alignment requirement is automatically fulfilled for the built-in types of char, short, int, long, longlong, float, double like float2 or float4.
For structs, the size and alignment requirements can be enforced by the compiler using the alignment specifiers __align__(8)
or __align__(16)
.
Example usage:
struct __align__(8) {
float r;
float i;
} complex_num;