avfoundation video-processing avkit video-toolbox

Decoding ProresRAW format to native bayer representation

I am trying to decode Prores video file But it doesn't work. I always got

Optional(Error Domain=AVFoundationErrorDomain Code=-11821 "Cannot Decode" UserInfo={NSLocalizedFailureReason=The media data could not be decoded. It may be damaged., NSLocalizedDescription=Cannot Decode, NSUnderlyingError=0x600002a982a0 {Error Domain=NSOSStatusErrorDomain Code=-12137 "(null)"}})

Here a full decoder:

class Decoder {

private let assetReader: AVAssetReader?
private let output: AVAssetReaderTrackOutput

init() throws {
    
    VTRegisterProfessionalVideoWorkflowVideoDecoders()
    VTRegisterProfessionalVideoWorkflowVideoEncoders()
    
        let assetReader = try AVAssetReader(asset: movieAsset)
        let tracks = movieAsset.tracks(withMediaType: .video)
        
        guard let firstTrack = tracks.first else {
            print("No video tracks found")
            throw NSError()
        }
    
        let out = AVAssetReaderTrackOutput(track: firstTrack, outputSettings: outputSettings)
        out.alwaysCopiesSampleData = true
        
        assetReader.add(out)
        
        self.assetReader = assetReader
        self.output = out

   
}

func run(){
    
    guard let assetReader = assetReader, assetReader.startReading() else {
        print("Failed to stard asset reader")
        return
    }
    

    while(assetReader.status == .reading) {
        guard let sampleBuffer = output.copyNextSampleBuffer() else {
            print(assetReader.status.rawValue)
            print(assetReader.error)
            continue
        }
          
        print("Decoding success!")
    }
}

}

Solution

It's not clear why you want Bayer and I'm not sure what you mean by "native", but I guess you might want your data to be

at its highest possible definition, or
in its most natural / efficient / least processed format
just Bayer don't ask me any more questions

So there are two possibilities I think.

If you like high definition data, try setting your AVAssetReaderTrackOutput pixel format to kCVPixelFormatType_444YpCbCr16VideoRange_16A_TriPlanar, kCVPixelFormatType_4444AYpCbCr16 or kCVPixelFormatType_64RGBALE or one of the other formats mentioned in the AVAssetReaderTrackOutput. I'd think the chances are good that AVAssetReader won't gratuitously truncate the data.

I have no idea about natural or efficient representations when working with ProRes RAW, but if you really want Bayer output, you can set your outputSettings to nil and use a VTDecompressionSession to convert the raw sample buffers to kCVPixelFormatType_16VersatileBayer (or kCVPixelFormatType_64RGBAHalf, kCVPixelFormatType_128RGBAFloat if you're still into high range formats that AVAssetReader dislikes for some reason), but not kCVPixelFormatType_64RGBA_DownscaledProResRAW as that doesn't seem to work.

Anyway, you could lightly modify your code to do decode to kCVPixelFormatType_16VersatileBayer like so:

import AVFoundation
import VideoToolbox

class Decoder {
    private let assetReader: AVAssetReader?
    private let output: AVAssetReaderTrackOutput
    private var decompressionSession: VTDecompressionSession!
        
    init() throws {
        let movieUrl = URL(fileURLWithPath: "/Users/xxxx/ProresRAW_Video.MOV")
        let movieAsset = AVAsset(url: movieUrl)
        
        do {
            let assetReader = try AVAssetReader(asset: movieAsset)
            let tracks = movieAsset.tracks(withMediaType: .video)
            
            guard let firstTrack = tracks.first else {
                print("No video tracks found")
                throw NSError()
            }
            
            let out = AVAssetReaderTrackOutput(track: firstTrack, outputSettings: nil)
            out.alwaysCopiesSampleData = true
            
            assetReader.add(out)
            
            self.assetReader = assetReader
            self.output = out
            
        } catch {
            print(error)
            throw error
        }
        
    }
    
    func run(){
        guard let assetReader = assetReader, assetReader.startReading() else {
            print("Failed to stard asset reader")
            return
        }
        
        while(assetReader.status == .reading) {
            guard let sampleBuffer = output.copyNextSampleBuffer() else {
                print(assetReader.status.rawValue)
                print(assetReader.error)
                continue
            }
            
            print("Decoding success! \(sampleBuffer)")
            
            if let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer) {
                if decompressionSession == nil {
                    let imageBufferAttributes: [CFString: Any] = [
                        kCVPixelBufferPixelFormatTypeKey: kCVPixelFormatType_16VersatileBayer
                    ]
                    var outputCallback = VTDecompressionOutputCallbackRecord(decompressionOutputCallback: { _, _, status, infoFlags, imageBuffer, presentationTimeStamp, presentationDuration in
                        assert(noErr == status)
                        print("decode callback status: \(status), bayer imageBuffer \(String(describing: imageBuffer)), flags: \(infoFlags), pts: \(presentationDuration), ptsd: \(presentationDuration)")
                    }, decompressionOutputRefCon: nil)
                    let status = VTDecompressionSessionCreate(allocator: nil, formatDescription: formatDescription, decoderSpecification: nil, imageBufferAttributes: imageBufferAttributes as CFDictionary, outputCallback: &outputCallback, decompressionSessionOut: &decompressionSession)
                    assert(noErr == status)
                }

                let status = VTDecompressionSessionDecodeFrame(decompressionSession, sampleBuffer: sampleBuffer, flags: [], frameRefcon: nil, infoFlagsOut: nil)
                assert(noErr == status)
            }
        }
    }
}

The thing I don't get is why AVAssetReader, which probably uses VTDecompressionSession under the hood doesn't simply let you request kCVPixelFormatType_16VersatileBayer in the first place. Maybe it's bloody mindedness or maybe it doesn't make sense? p.s. what are you trying to do?