The program I am inspecting uses pytorch to load weights and cuda code to do the computations with the weights. My understanding of THC library is how tensors are implemented in the backend of pytorch ( and torch, maybe? ).
I am still not exactly sure of the inner-workings of THCudaTensor_data, but the behaviour that was tripping me up was: for n-dimensional tensor, THCudaTensor_data returns a flattened 1D array of the tensor.
Hope this helps