Search code examples
iosswiftmetalcore-imagecoreml

How can I read the CVPixelBuffer as 4 channel float format from a CIImage?


I'm currently trying to do some calculations on a CIImage construct. We are using a custom Core ML model on video frames, and in the meantime using GPU to translate these with CIFilters to required formats.

For one step, I need to do some calculations on two of the outputs generated by a model, and find the mean and standart deviations from the pixel data per channel.

For testing and tech preview, I was able to create a UIImage, read CVPixelData, convert and calculate on the CPU. But while trying to adapt it to the GPU I hit a rock.

The process is simple:

  • Convert CIImage BGRA to LAB format. We do not need the alpha channel, but kept as LAB-A
  • Do calculations on the pixel data.
  • Return from LAB to BGRA, and copy the alpha channel as is.

At current state, I am using a custom CIFilter + Metal kernel to convert the CIImage from RGB to LAB (and back to RGB) format. Without calculations in between, RGB > LAB > RGB conversion works as expected and returns the same image without any deformations. This tells me that the float precision is not lost.

But when I tried to read the pixel data in between, I'm not able to get the float values I was looking for. CVPixelBuffer created from the LAB formatted CIImage is giving me values that are always zero. Tried a few different OSType formats like kCVPixelFormatType_64RGBAHalf, kCVPixelFormatType_128RGBAFloat, kCVPixelFormatType_32ARGB, etc., none of them are returning the float values. But if I read data from another image I'm always getting the UInt8 values as expected...

So my question is as the title suggest "How can I read the CVPixelBuffer as a 4 channel float format from a CIImage?"

Simplified Swift and Metal code for the process is as follows.

let ciRgbToLab = CIConvertRGBToLAB() // CIFilter using metal for kernel
let ciLabToRgb = CIConvertLABToRGB() // CIFilter using metal for kernel

ciRgbToLab.inputImage = source // "source" is a CIImage
guard let sourceLab = ciRgbToLab.outputImage else { throw ... }

ciRgbToLab.inputImage = target // "target" is a CIImage
guard let targetLab = ciRgbToLab.outputImage { throw ... }

// Get the CVPixelBuffer and lock the data.
guard let sourceBuffer = sourceLab.cvPixelBuffer else { throw ... }
CVPixelBufferLockBaseAddress(sourceBuffer, CVPixelBufferLockFlags(rawValue: 0))
defer {
  CVPixelBufferUnlockBaseAddress(sourceBuffer, CVPixelBufferLockFlags(rawValue: 0))
}

// Access to the data
guard let sourceAddress = CVPixelBufferGetBaseAddress(sourceBuffer) { throw ... }
let sourceDataSize = CVPixelBufferGetDataSize(sourceBuffer)
let sourceData = sourceAddress.bindMemory(to: CGFloat.self, capacity: sourceDataSize)
// ... do calculations
// ... generates a new CIImage named "targetTransfered"

ciLabToRgb.inputImage = targetTransfered //*
guard let rgbFinal = ciLabToRgb.outputImage else  { throw ... }

//* If "targetTransfered" is replaced with "targetLab", we get the exact image as "target".
#include <metal_stdlib>
using namespace metal;

#include <CoreImage/CoreImage.h>

extern "C" {
  namespace coreimage {
    float4 xyzToLabConversion(float4 pixel) {
      ...
      return float4(l, a, b, pixel.a);
    }
    
    float4 rgbToXyzConversion(float4 pixel) {
      ...
      return float4(x, y, z, pixel.a);
    }
    
    float4 rgbToLab(sample_t s) {
      float4 xyz = rgbToXyzConversion(s);
      float4 lab = xyzToLabConversion(xyz);
      return lab;
    }
    
    float4 xyzToRgbConversion(float4 pixel) {
      ...
      return float4(R, G, B, pixel.a);
    }
    
    float4 labToXyzConversion(float4 pixel) {
      ...
      return float4(X, Y, Z, pixel.a);
    }
    
    float4 labtoRgb(sample_t s) {
      float4 xyz = labToXyzConversion(s);
      float4 rgb = xyzToRgbConversion(xyz);
      return rgb;
    }
  }
}

This is the extension I'm using to convert CIImage to CVPixelBuffer. As the image is created on device by the same source, it is always in BGRA format. I have no idea how to convert this to get float values...

extension CIImage {
    var cvPixelBuffer: CVPixelBuffer? {
    let attrs = [
                  kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue,
                  kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue,
                  kCVPixelBufferMetalCompatibilityKey: kCFBooleanTrue
                ] as CFDictionary

    var pixelBuffer: CVPixelBuffer?
    let status = CVPixelBufferCreate(kCFAllocatorDefault,
                                     Int(self.extent.width),
                                     Int(self.extent.height),
                                     kCVPixelFormatType_32BGRA,
                                     attrs,
                                     &pixelBuffer)

    guard status == kCVReturnSuccess else { return nil }
    guard let buffer = pixelBuffer else { return nil }

    CVPixelBufferLockBaseAddress(buffer, CVPixelBufferLockFlags.init(rawValue: 0))

    let context = CIContext()
    context.render(self, to: buffer)

    CVPixelBufferUnlockBaseAddress(buffer, CVPixelBufferLockFlags(rawValue: 0))
    return pixelBuffer
  }
}

PS: I removed the metal kernel code to fit in here. If you need a RGB > LAB > RGB conversion, send me a message, I'm happy to share the filter.


Solution

  • It's very strange that you get all zeros, especially when you set the format to kCVPixelFormatType_128RGBAFloat...

    However, I highly recommend you check out CIImageProcessorKernel, it's made for this very use case: adding custom (potentially CPU-based) processing steps to a Core Image pipeline. In the process function you get access to the input and output buffers either as MTLTexture, CVPixelBuffer, or even direct access to the baseAddress.

    Here is an example kernel I wrote for computing the mean and variance of the input image using Metal Performance Shaders and returning them in a 2x1 pixel CIImage:

    import CoreImage
    import MetalPerformanceShaders
    
    
    /// Processing kernel that computes the mean and the variance of a given image and stores
    /// those values in a 2x1 pixel return image.
    class MeanVarianceKernel: CIImageProcessorKernel {
    
        override class func roi(forInput input: Int32, arguments: [String : Any]?, outputRect: CGRect) -> CGRect {
            // we need to read the full extend of the input
            return arguments?["inputExtent"] as? CGRect ?? outputRect
        }
    
        override class var outputFormat: CIFormat {
            return .RGBAf
        }
    
        override class var synchronizeInputs: Bool {
            // no need to wait for CPU synchronization since the processing is also happening on the GPU
            return false
        }
    
        /// Convenience method for calling the `apply` method from outside.
        class func apply(to input: CIImage) -> CIImage {
            // pass the extent of the input as argument since we need to know the full extend in the ROI callback above
            return try! self.apply(withExtent: CGRect(x: 0, y: 0, width: 2, height: 1), inputs: [input], arguments: ["inputExtent": input.extent])
        }
    
        override class func process(with inputs: [CIImageProcessorInput]?, arguments: [String : Any]?, output: CIImageProcessorOutput) throws {
            guard
                let commandBuffer = output.metalCommandBuffer,
                let input = inputs?.first,
                let sourceTexture = input.metalTexture,
                let destinationTexture = output.metalTexture
            else {
                return
            }
    
            let meanVarianceShader = MPSImageStatisticsMeanAndVariance(device: commandBuffer.device)
            meanVarianceShader.encode(commandBuffer: commandBuffer, sourceTexture: sourceTexture, destinationTexture: destinationTexture)
        }
    
    }
    

    It can easily be added to a filter pipeline like this:

    let meanVariance: CIImage = MeanVarianceKernel.apply(to: inputImage)