So I'm developing a Neural Network to run in iOS on the GPU, so using matrix notation I need (in order to backpropagate the errors) be able to perform an outer product of 2 vectors.
// Outer product of vector A and Vector B
kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
const device float *inVectorB [[ buffer(1) ]],
device float *outVector [[ buffer(2) ]],
uint id [[ thread_position_in_grid ]]) {
outVector[id] = inVectorA[id] * inVectorB[***?***]; // How to find this position on the thread group (or grid)?
}
You are using thread_position_in_grid
incorrectly. If you are dispatching a 2D grid, it should be uint2
or ushort2
, otherwise it only gets the x
coordinate. Refer to table 5.7 in Metal Shading Language specification.
I'm not sure which outer product are we talking about, but I think the output should be a matrix. If you are storing it linearly, then your code to calculate the outVector
should look something like this:
kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
const device float *inVectorB [[ buffer(1) ]],
uint2 gridSize [[ threads_per_grid ]],
device float *outVector [[ buffer(2) ]],
uint2 id [[ thread_position_in_grid ]]) {
outVector[id.y * gridSize.x + id.x] = inVectorA[id.x] * inVectorB[id.y];
}
Also, if you are dispatching a grid exactly the size of inVectorA
xinVectorB
, you can use attribute threads_per_grid
on a kernel argument to find out how big the grid is.
Alternatively, you can just pass the sizes of the vectors alongside the vectors themselves.