How to perform Outer product of 2 vectors in Metal shaders?

So I'm developing a Neural Network to run in iOS on the GPU, so using matrix notation I need (in order to backpropagate the errors) be able to perform an outer product of 2 vectors.


// Outer product of vector A and Vector B
kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         device float *outVector [[ buffer(2) ]],
                         uint id [[ thread_position_in_grid ]]) {
    
    outVector[id] = inVectorA[id] * inVectorB[***?***]; // How to find this position on the thread group (or grid)?
}

Solution

You are using thread_position_in_grid incorrectly. If you are dispatching a 2D grid, it should be uint2 or ushort2, otherwise it only gets the x coordinate. Refer to table 5.7 in Metal Shading Language specification.

I'm not sure which outer product are we talking about, but I think the output should be a matrix. If you are storing it linearly, then your code to calculate the outVector should look something like this:

kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         uint2 gridSize [[ threads_per_grid ]],
                         device float *outVector [[ buffer(2) ]],
                         uint2 id [[ thread_position_in_grid ]]) {
    
    outVector[id.y * gridSize.x + id.x] = inVectorA[id.x] * inVectorB[id.y];
}

Also, if you are dispatching a grid exactly the size of inVectorAxinVectorB, you can use attribute threads_per_grid on a kernel argument to find out how big the grid is.

Alternatively, you can just pass the sizes of the vectors alongside the vectors themselves.