Search code examples
objective-cswiftmetalmetal-performance-shaders

Apple Metal Element-wise Matrix Multiplication (Hadamard Product)


Is it possible to perform a Hadamard Product using Apple's Metal Performance Shaders? I see that a normal matrix multiplication can be performed using this, but I am specifically looking for an element-wise multiplication, or a clever way to construct one. (For instance, is it possible to convert a MPSMatrix into a MPSVector and then perform the product using vectors?)

UPDATE: I appreciate the recommendation to use shaders! I'm working on an implementation and this looks promising! I will post the solution once I have something working.


Solution

  • Alright, answering my own question here based on recommendations from commenters- try writing my own shader!

    Here's the shader code:

    #include <metal_stdlib>
    using namespace metal;
    
    /*
     hadamardProduct:
     Perform an element-wise multiplication (hadamard product) of the two input matrices A and B, store the result in C
     */
    kernel void hadamardProductKernel(
    texture_buffer<float, access::read> A [[texture(0)]],
    texture_buffer<float, access::read> B [[texture(1)]],
    texture_buffer<float, access::write> C [[texture(2)]],
    uint gid [[thread_position_in_grid]]) {
        // C[i,j] = A[i,j] * B[i,j]
        C.write(A.read(gid) * B.read(gid), gid);
    }
    

    And the swift that executes the shader on two 4x4 matrices:

    import Foundation
    import Metal
    import MetalKit
    
    guard
        let gpu = MTLCreateSystemDefaultDevice(),
        let commandQueue = gpu.makeCommandQueue(),
        let commandBuffer = commandQueue.makeCommandBuffer(),
        let defaultLibrary = gpu.makeDefaultLibrary(),
        let kernelFunction = defaultLibrary.makeFunction(name: "hadamardProductKernel")
    else {exit(1)}
    
    // Create the matrices to multiply (as row-major matrices)
    var A:[Float] = [2,0,0,0,
                     0,2,0,0,
                     0,0,2,0,
                     0,0,0,2]
    
    var B:[Float] = [1,0,0,0,
                     0,2,0,0,
                     0,0,3,0,
                     0,0,0,4]
    
    let A_buffer = gpu.makeTexture(descriptor: MTLTextureDescriptor.textureBufferDescriptor(with: .r32Float,
                                                                                                                                                                                    width: 16,
                                                                                                                                                                                    resourceOptions: .storageModeManaged,
                                                                                                                                                                                    usage: .shaderRead))
    let B_buffer = gpu.makeTexture(descriptor: MTLTextureDescriptor.textureBufferDescriptor(with: .r32Float,
                                                                                                                                                                                    width: 16,
                                                                                                                                                                                    resourceOptions: .storageModeManaged,
                                                                                                                                                                                    usage: .shaderRead))
    let C_buffer = gpu.makeTexture(descriptor: MTLTextureDescriptor.textureBufferDescriptor(with: .r32Float,
                                                                                                                                                                                    width: 16,
                                                                                                                                                                                    resourceOptions: .storageModeManaged,
                                                                                                                                                                                    usage: .shaderWrite))
    A_buffer?.replace(region: MTLRegionMake1D(0, 16),
                      mipmapLevel: 0,
                      withBytes: UnsafeRawPointer(A),
                      bytesPerRow: 64)
    B_buffer?.replace(region: MTLRegionMake1D(0, 16),
                      mipmapLevel: 0,
                      withBytes: UnsafeRawPointer(B),
                      bytesPerRow: 64)
    
    let computePipelineState = try gpu.makeComputePipelineState(function: kernelFunction)
    let computeEncoder = commandBuffer.makeComputeCommandEncoder()
    computeEncoder?.setComputePipelineState(computePipelineState)
    computeEncoder?.setTexture(A_buffer, index: 0)
    computeEncoder?.setTexture(B_buffer, index: 1)
    computeEncoder?.setTexture(C_buffer, index: 2)
    let threadGroupSize = MTLSize(width: 16, height: 1, depth: 1)
    let threadGroupCount = MTLSize(width: 1, height: 1, depth: 1)
    computeEncoder?.dispatchThreadgroups(threadGroupCount, threadsPerThreadgroup: threadGroupSize)
    computeEncoder?.endEncoding()
    commandBuffer.commit()
    commandBuffer.waitUntilCompleted()
    
    print("done")
    

    Appreciate any comments linking to resources for further learning about this kind of thing.