Search code examples
pythoncudapycuda

Explain pitch, width, height, depth in memory for 3D arrays


I am working with CUDA and 3D textures in python (using pycuda). There is a function called Memcpy3D which has the same members as Memcpy2D plus a few extras. In it it calls you to describe things such as width_in_bytes, src_pitch, src_height, height and copy_depth. This is what I am struggling with (in 3D) and its relevance with C or F style indexing. For instance, if I simply change the ordering from F to C in the working example below, it stops working - and I don't know why.

  1. First of all, I understand pitch to be how many bytes in memory it takes to move one index across in threadIdx.x (or the x direction, or a column). So for a float32 array of C shape (3,2,4), to move one value in x I expect to move 4 values in memory (as the indexing goes down the z axis first?). Therefore my pitch would be 4*32bits.
  2. I understand height to be the number of rows. (In this example, 3)
  3. I understand width to be the number of cols. (In this example, 2)
  4. I understand depth to be the number of z slices. (In this example, 4)
  5. I understand width_in_bytes to be the width of a row in x inclusive of the z elements behind it, i.e. a row slice, (0,:,:). This would be how many addresses in memory it takes to transverse one element in the y-direction.

So when I change the ordering from F to C in the code below, and adapt the code to change the height/width values accordingly it still doesn't work. It just presents a logic failure which makes me think I'm not understanding the concept of pitch, width, height, depth correctly.

Please educate me.

Below is a full working script that copies an array to the GPU as a texture and copies the contents back.

import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy as np

w = 2
h = 3
d = 4
shape = (w, h, d)

a = np.arange(24).reshape(*shape,order='F').astype('float32')
print(a.shape,a.strides)
print(a)


descr = drv.ArrayDescriptor3D()
descr.width = w
descr.height = h
descr.depth = d
descr.format = drv.dtype_to_array_format(a.dtype)
descr.num_channels = 1
descr.flags = 0

ary = drv.Array(descr)

copy = drv.Memcpy3D()
copy.set_src_host(a)
copy.set_dst_array(ary)
copy.width_in_bytes = copy.src_pitch = a.strides[1]
copy.src_height = copy.height = h
copy.depth = d

copy()

mod = SourceModule("""
    texture<float, 3, cudaReadModeElementType> mtx_tex;

    __global__ void copy_texture(float *dest)
    {
      int x = threadIdx.x;
      int y = threadIdx.y;
      int z = threadIdx.z;
      int dx = blockDim.x;
      int dy = blockDim.y;
      int i = (z*dy + y)*dx + x;
      dest[i] = tex3D(mtx_tex, x, y, z);
    }
""")

copy_texture = mod.get_function("copy_texture")
mtx_tex = mod.get_texref("mtx_tex")

mtx_tex.set_array(ary)

dest = np.zeros(shape, dtype=np.float32, order="F")
copy_texture(drv.Out(dest), block=shape, texrefs=[mtx_tex])

print(dest)

Solution

  • Not sure I fully understand the problem in your code, but I'll attempt to clarify.

    In CUDA, width (x) refers to the fastest-changing dimension, height (y) is the middle dimension, and depth (z) is the slowest-changing dimension. The pitch refers to the stride in bytes required to step between values along the y dimension.

    In Numpy, an array defined as np.empty(shape=(3,2,4), dtype=np.float32, order="C") has strides=(32, 16, 4), and corresponds to width=4, height=2, depth=3, pitch=16.

    Using "F" ordering in Numpy means the order of dimensions is reversed in memory.

    Your code appears to work if I make the following changes:

    #shape = (w, h, d)
    shape = (d, h, w)
    
    #a = np.arange(24).reshape(*shape,order='F').astype('float32')
    a = np.arange(24).reshape(*shape,order='C').astype('float32')
    ...
    #dest = np.zeros(shape, dtype=np.float32, order="F")
    dest = np.zeros(shape, dtype=np.float32, order="C")
    #copy_texture(drv.Out(dest), block=shape, texrefs=[mtx_tex])
    copy_texture(drv.Out(dest), block=(w,h,d), texrefs=[mtx_tex])