Search code examples
swiftcgimagecvpixelbufferscreencapturekit

ScreenCaptureKit/CVPixelBuffer format yields unexpected results


I have a project where I use ScreenCaptureKit. For various reasons out of the scope of the question, the format that I configure ScreenCaptureKit to use is kCVPixelFormatType_32BGRA -- I need the raw BGRA data, which gets manipulated later on.

When I construct a CGImage or NSImage from the data, displays and some windows look fine (full code included at the bottom of the question -- this is just an excerpt of the conversion).

guard let cvPixelBuffer = sampleBuffer.imageBuffer else { return }
CVPixelBufferLockBaseAddress(cvPixelBuffer, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(cvPixelBuffer, .readOnly) }
let vImageBuffer: vImage_Buffer = vImage_Buffer(data: CVPixelBufferGetBaseAddress(cvPixelBuffer),
                                                        height: vImagePixelCount(CVPixelBufferGetHeight(cvPixelBuffer)),
                                                        width: vImagePixelCount(CVPixelBufferGetWidth(cvPixelBuffer)),
                                                        rowBytes: CVPixelBufferGetWidth(cvPixelBuffer) * 4)
        
let cgImageFormat: vImage_CGImageFormat = vImage_CGImageFormat(
                    bitsPerComponent: 8,
                    bitsPerPixel: 32,
                    colorSpace: CGColorSpaceCreateDeviceRGB(),
                    bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.last.rawValue),
                    renderingIntent: .defaultIntent
                )!
if let cgImage: CGImage = try? vImageBuffer.createCGImage(format: cgImageFormat) {
  let nsImage = NSImage(cgImage: cgImage, size: .init(width: CGFloat(cgImage.width), height: CGFloat(cgImage.height)))
  Task { @MainActor in
    self.image = nsImage
  }
}

The resulting image for displays looks reasonable (except for incorrect color, since the incoming data is BGRA and the CGImage expects RGBA -- that's dealt with elsewhere in my project).

enter image description here

However, some windows (not all) get a very odd distortion and tearing effect. Here's Calendar.app for example:

enter image description here

Here is Mail.app, which is less broken:

enter image description here

As far as I can tell, the formats of the CVPixelBuffer are the same in each case. When I inspect the CVPixelBuffer using the debugger (instead of doing the conversion to CGImage/NSImage) the CVPixelBuffer displays perfectly in QuickLook, so it's not that the actual data is damaged either -- there's just something about the format I don't understand.

Question:

How can I get the RGBA data reliably from these windows in the same way that it is always returned for displays?


Full, runnable sample code:


class ScreenCaptureManager: NSObject, ObservableObject {
    @Published var availableWindows: [SCWindow] = []
    @Published var availableDisplays: [SCDisplay] = []
    @Published var image: NSImage?
    private var stream: SCStream?
    private let videoSampleBufferQueue = DispatchQueue(label: "com.sample.VideoSampleBufferQueue")
    
    func getAvailableContent() {
        Task { @MainActor in
            do {
                let availableContent: SCShareableContent = try await SCShareableContent.excludingDesktopWindows(true,
                                                                                                                onScreenWindowsOnly: true)
                self.availableWindows = availableContent.windows
                self.availableDisplays = availableContent.displays
            } catch {
                print(error)
            }
        }
    }
    
    func basicStreamConfig() -> SCStreamConfiguration {
        let streamConfig = SCStreamConfiguration()
        streamConfig.minimumFrameInterval = CMTime(value: 1, timescale: 5)
        streamConfig.showsCursor = true
        streamConfig.queueDepth = 5
        streamConfig.pixelFormat = kCVPixelFormatType_32BGRA
        return streamConfig
    }
    
    func startCaptureForDisplay(display: SCDisplay) {
        Task { @MainActor in
            try? await stream?.stopCapture()
            let filter = SCContentFilter(display: display, including: availableWindows)
            let streamConfig = basicStreamConfig()
            streamConfig.width = Int(display.frame.width * 2)
            streamConfig.height = Int(display.frame.height * 2)
            stream = SCStream(filter: filter, configuration: streamConfig, delegate: self)
            do {
                try stream?.addStreamOutput(self, type: .screen, sampleHandlerQueue: videoSampleBufferQueue)
                try await stream?.startCapture()
            } catch {
                print("ERROR: ", error)
            }
        }
    }
    
    func startCaptureForWindow(window: SCWindow) {
        Task { @MainActor in
            try? await stream?.stopCapture()
            let filter = SCContentFilter(desktopIndependentWindow: window)
            let streamConfig = basicStreamConfig()
            streamConfig.width = Int(window.frame.width * 2)
            streamConfig.height = Int(window.frame.height * 2)
            
            stream = SCStream(filter: filter, configuration: streamConfig, delegate: self)
            do {
                try stream?.addStreamOutput(self, type: .screen, sampleHandlerQueue: videoSampleBufferQueue)
                try await stream?.startCapture()
            } catch {
                print(error)
            }
        }
    }
}

extension ScreenCaptureManager: SCStreamOutput, SCStreamDelegate {
    func stream(_: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer, of _: SCStreamOutputType) {
        guard let cvPixelBuffer = sampleBuffer.imageBuffer else { return }
        
        print("PixelBuffer", cvPixelBuffer)
        
        CVPixelBufferLockBaseAddress(cvPixelBuffer, .readOnly)

        defer {
            CVPixelBufferUnlockBaseAddress(cvPixelBuffer, .readOnly)
        }
        
        let vImageBuffer: vImage_Buffer = vImage_Buffer(data: CVPixelBufferGetBaseAddress(cvPixelBuffer),
                                                        height: vImagePixelCount(CVPixelBufferGetHeight(cvPixelBuffer)),
                                                        width: vImagePixelCount(CVPixelBufferGetWidth(cvPixelBuffer)),
                                                        rowBytes: CVPixelBufferGetWidth(cvPixelBuffer) * 4)
        
        let cgImageFormat: vImage_CGImageFormat = vImage_CGImageFormat(
                    bitsPerComponent: 8,
                    bitsPerPixel: 32,
                    colorSpace: CGColorSpaceCreateDeviceRGB(),
                    bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.last.rawValue),
                    renderingIntent: .defaultIntent
                )!
        if let cgImage: CGImage = try? vImageBuffer.createCGImage(format: cgImageFormat) {
            let nsImage = NSImage(cgImage: cgImage, size: .init(width: CGFloat(cgImage.width), height: CGFloat(cgImage.height)))
            Task { @MainActor in
                self.image = nsImage
            }
        }
    }

    func stream(_: SCStream, didStopWithError error: Error) {
        print("JN: Stream error", error)
    }
}

struct ContentView: View {
    @StateObject private var screenCaptureManager = ScreenCaptureManager()
    
    var body: some View {
        HStack {
            ScrollView {
                ForEach(screenCaptureManager.availableDisplays, id: \.displayID) { display in
                    HStack {
                        Text("Display: \(display.width) x \(display.height)")
                    }.frame(height: 60).frame(maxWidth: .infinity).border(Color.black).contentShape(Rectangle())
                    .onTapGesture {
                        screenCaptureManager.startCaptureForDisplay(display: display)
                    }
                }
                ForEach(screenCaptureManager.availableWindows.filter { $0.title != nil && !$0.title!.isEmpty }, id: \.windowID) { window in
                    HStack {
                        Text(window.title!)
                    }.frame(height: 60).frame(maxWidth: .infinity).border(Color.black).contentShape(Rectangle())
                    .onTapGesture {
                        screenCaptureManager.startCaptureForWindow(window: window)
                    }
                }
            }
            .frame(width: 200)
            Divider()
            
            if let image = screenCaptureManager.image {
                Image(nsImage: image)
                    .resizable()
                    .aspectRatio(contentMode: .fit)
                    .frame(maxWidth: .infinity, maxHeight: .infinity)
            }
        }
        .frame(width: 800, height: 600, alignment: .leading)
        .onAppear {
            screenCaptureManager.getAvailableContent()
        }
    }
}

(Note: I know that displaying an NSImage of the captured content is not the most efficient method of previewing the content -- it's merely for showing the issue here)


Solution

  • ScreenCaptureKit can return CVPixelBuffers (via CMSampleBuffer) that have padding bytes on the end of each row. The problem line of my code was:

    rowBytes: CVPixelBufferGetWidth(cvPixelBuffer) * 4
    

    This line made the assumption that the rowBytes would be the width of the image, multiplied by 4, since in RGBA formats, there are four bytes per pixel.

    This line should have been:

    rowBytes: CVPixelBufferGetBytesPerRow(cvPixelBuffer)
    

    This value can vary based on hardware as described in: Technical Q&A QA1829

    It seems that having a hardware aligned rowBytes parameter allows some efficiencies when moving the information around quickly on a local machine. However, when moving the data elsewhere (over a network, for example), most often the destination will expect the extra bytes to not appear in the data, meaning that you have to copy the rows without the extra padding bytes before transferring the data.