2D textures are a useful feature of CUDA in image processing applications. To bind pitch linear memory to 2D textures, the memory has to be aligned. cudaMallocPitch
is a good option for aligned memory allocation. On my device, the pitch returned by cudaMallocPitch
is a multiple of 512, i.e the memory is 512 byte aligned.
The actual alignment requirement for the device is determined by cudaDeviceProp::texturePitchAlignment
which is 32 bytes on my device.
My question is:
If the actual alignment requirement for 2D textures is 32 bytes, then why does cudaMallocPitch
return 512 byte aligned memory?
Isn't it a waste of memory? For example if I create an 8 bit image of size 513 x 100, it will occupy 1024 x 100 bytes.
I get this behaviour on following systems:
1: Asus G53JW + Windows 8 x64 + GeForce GTX 460M + CUDA 5 + Core i7 740QM + 4GB RAM
2: Dell Inspiron N5110 + Windows 7 x64 + GeForce GT525M + CUDA 4.2 + Corei7 2630QM + 6GB RAM
This is a slightly speculative answer, but keep in mind that there are two alignment properties which the pitch of an allocation must satisfy for textures, one for the texture pointer and one for the texture rows. I suspect that cudaMallocPitch
is honouring the former, defined by cudaDeviceProp::textureAlignment
. For example:
#include <cstdio>
int main(void)
{
const int ncases = 12;
const size_t widths[ncases] = { 5, 10, 20, 50, 70, 90, 100,
200, 500, 700, 900, 1000 };
const size_t height = 10;
float *vals[ncases];
size_t pitches[ncases];
struct cudaDeviceProp p;
cudaGetDeviceProperties(&p, 0);
fprintf(stdout, "Texture alignment = %zd bytes\n",
p.textureAlignment);
cudaSetDevice(0);
cudaFree(0); // establish context
for(int i=0; i<ncases; i++) {
cudaMallocPitch((void **)&vals[i], &pitches[i],
widths[i], height);
fprintf(stdout, "width = %zd <=> pitch = %zd \n",
widths[i], pitches[i]);
}
return 0;
}
which gives the following on a GT320M:
Texture alignment = 256 bytes
width = 5 <=> pitch = 256
width = 10 <=> pitch = 256
width = 20 <=> pitch = 256
width = 50 <=> pitch = 256
width = 70 <=> pitch = 256
width = 90 <=> pitch = 256
width = 100 <=> pitch = 256
width = 200 <=> pitch = 256
width = 500 <=> pitch = 512
width = 700 <=> pitch = 768
width = 900 <=> pitch = 1024
width = 1000 <=> pitch = 1024
I am guessing that cudaDeviceProp::texturePitchAlignment
applies to CUDA arrays.